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Preface 



The Intel 1860^"^ Microprocessor (part number 80860) delivers supercomputer level performance 
in a single VLSI component. The 64-bit design of the i860 Microprocessor balances integer, 
floating point, and graphics performance for applications such as engineering workstations, 
scientific computing, 3-D graphics workstations, and multiuser systems. Its parallel architecture 
achieves high throughput with RISC design techniques, pipelined processing units, wide data 
paths, large on-chip caches, and fast one micron CHMOS IV silicon technology. 

This book is the basic source of the detailed information that enables software designers and 
programmers to use the i860 Microprocessor. This book explains all programmer-visible features 
of the architecture. 

Even though the principal users of this Programmer's Reference Manual will be programmers, it 
contains information that is of value to systems designers and administrators of software projects, 
as well. Readers of these latter categories may choose only to read the higher-level sections of the 
manual, skipping over much of the programmer-oriented detail. 

How to Use This Manual 



• 



• 



Chapter 1, "Architectural Overview," describes the i860 Microprocessor "in a nutshell" and 
presents for the first time the terms that will be used throughout the book. 

Chapter 2, "Data Types," defines the basic units operated on by the instructions of the i860 
Microprocessor. 

Chapter 3, "Registers," presents the processor's database. A detailed knowledge of the 
registers is important to programmers, but this chapter may be skimmed by administrators. 

Chapter 4, "Addressing," presents the details of operand alignment, page-oriented virtual 
memory, and on-chip caches. Systems designers and administrators may choose to read the 
introductory sections of each topic. 

Chapter 5, "Core Instructions," presents detailed information about those instructions that 
deal with memory addressing, integer arithmetic, and control flow. 

Chapter 6, "Floating-Point Instructions," presents detailed information about those instructions 
that deal with floating-point arithmetic, long-integer arithmetic, and 3-D graphics support. 
Explains how extremely high performance can be achieved by utilizing the parallelism and 
pipelining of the i860 Microprocessor. 

Chapter 7, "Traps and Interrupts," deals with both systems- and applications-oriented 
exceptions, external interrupts, writing exception handlers, saving the state of the processor 
(information that is also useful for task switching), and initialization. 

Chapter 8, "Programming Model," defines standards for the use of many features of the i860 
Microprocessor. Software administrators should be aware of the need for standards and 
should ensure that they are implemented. Following the standards presented here guarantees 
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that compilers, applications programs, and operating systems written by different people and 
organizations will all work together. 

• Chapter 9, "Programming Examples," illustrates the use of the i860 Microprocessor by 
presenting short code sequences in assembly language. 

• The appendices present instruction formats and encodings, timing information, and summaries 
of instruction characteristics. These appendices are of most interest to assembly-language 
programmers and to writers of assemblers, compilers, and debuggers. 

Related Documentation 

The following books contain additional material concerning the i860 Microprocessor: 

• i860 64-bit Microprocessor (Data Sheet), order number 240296 

• i860 Microprocessor Assembler and Linker Reference Manual, order number 240436 

• i860 Microprocessor Simulator-Debugger Reference Manual, order number 240437 

Notation and Conventions 

The instruction chapters contain an algorithmic description of each instruction that uses a notation 
similar to that of the Algol or Pascal languages. The metalanguage uses the following special 
symbols: 

• A -4 — B indicates that the value of B is assigned to A. 

• Compound statements are enclosed between the keywords of the "if" statement (IF ... , 
THEN . . . , ELSE . . . , FI) or of the "do" statement (DO . . . , OD). 

• The operator ++ indicates autoincrement addressing. 

• Register names and instruction mnemonics are printed in a contrasting typestyle to make 
them stand out from the text; for example, dirbase. Individual programming languages may 
require the use of lowercase letters. 

Hexadecimal constants are written, according to the C language convention, with the prefix Ox. 
For example, OxOF is a hexadecimal number that is equivalent to decimal 15. 

Reserved Bits and Software Compatibility 

In many register and memory layout descriptions, certain bits are marked as reserved or undefined. 
When bits are thus marked, it is essential for compatibility with future processors that software 
not utilize these bits. Software should follow these guidelines in dealing with reserved or undefined 
bits: 

• Do not depend on the states of any reserved or undefined bits when testing the values of 
registers that contain such bits. Mask out the reserved and undefined bits before testing. 
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• Do not depend on the states of any reserved or undefined bits when storing them in memory 
or in a another register. 

• Do not depend on the ability to retain information written into any reserved or undefined bits . 

• When loading a register, always load the reserved and undefined bits as zeros or reload them 
with values previously stored from the same register. 

NOTE 

Depending upon the values of reserved or undefined bits makes software dependent 
upon the unspecified manner in which the i860 Microprocessor handles these bits. 
Depending upon values of reserved or undefined bits risks making software 
incompatible with future processors that define usages for these bits. AVOID ANY 
SOFTWARE DEPENDENCE UPON THE STATE OF RESERVED OR UN- 
DEFINED BITS 
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Chapter 1 
Architectural Overview 



The Intel i860™ 64-bit Microprocessor defines a complete architecture that balances integer, 
floating point, and graphics performance. Target applications include engineering workstations, 
scientific computing, 3-D graphics workstations, and multiuser systems. Its parallel architecture 
achieves high throughput with RISC design techniques, pipelined processing units, wide data 
paths, and large on-chip caches. 

1.1 OVERVIEW 

The i860 Microprocessor supports more than just integer operations. The architecture includes on 
a single chip: 



Integer operations 
Floating-point operations 
Graphics operations 
Memory-management support 
Data and instruction caches 



Having a data cache as an integral part of the architecture provides support for vector operations. 
The data cache supports integer programs in the conventional manner, without explicit 
programming. For vector operations, however, programmers can explicitly use the data cache as 
if it were a large block of vector registers. 

To sustain high performance, the i860 Microprocessor incorporates wide information paths that 
include: 

• 64-bit external data bus 

• 128-bit on-chip data bus 

• 64-bit on-chip instruction bus 

Floating-point vector operations use all three busses. 

To drive the graphics and floating point hardware, the i860 Microprocessor includes a RISC 
integer core processing unit with one-clock instruction execution. This unit also processes 
conventional integer programs. It provides complete support for standard operating systems, such 
as UNIX and OS/2. 

The i860 Microprocessor supports vector floating-point operations without special vector 
instructions or vector registers. It accomplishes this by using the on-chip data cache and a variety 
of parallel techniques that include: 

• Pipelined instruction execution with delayed branch instructions to avoid breaks in the 
pipeline. 
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• Instructions that automatically increment index registers so as to reduce the number of 
instructions needed for vector processing. 

• Parallel integer core and floating-point processing units. 

• Parallel multiplier and adder units within the floating-point unit. 

• Pipelined floating-point hardware units, with both scalar (nonpipelined) and vector (pipelined) 
variants of floating-point instructions. Software can switch between scalar and pipelined 
modes. 

• Large register set with 32 general-purpose integer registers, each 32-bits wide, and 32 
floating-point registers, each 32-bits wide, that can also be configured as 64- and 128-bit 
registers. The floating-point registers also serve as the staging area for data going into and out 
of the floating-point pipelines. 

There are two classes of instructions: 

• Core instructions (executed by the integer core unit). 

• Floating-point and graphics instructions (executed by the floating-point unit and graphics 
unit). 

The processor has a dual- instruction mode that can simultaneously execute one instruction from 
each class (core and floating-point). Software can switch between dual- and single-instruction 
modes. Within the floating-point unit, special dual-operation instructions (add-and- multiply, 
subtract-and-multiply) use the adder and multiplier units in parallel. With both dual-instruction 
mode and dual operation instructions, the i860 Microprocessor can execute three operations 
simultaneously. 

The integer core unit manages data flow and loop control for the floating point units. Together, 
they efficientiy execute such common tasks as evaluating systems of linear equations, performing 
the Fast Fourier Transform (FFT), and performing graphics transformations. 

1.2 INTEGER CORE UNIT 

The core unit is the administrative center of the i860 Microprocessor. The core unit fetches both 
integer and floating-point instructions. It contains the integer register file, and decodes and 
executes load, store, integer, bit, and control-transfer operations. Its pipelined organization with 
extensive bypassing and scoreboarding maximizes performance. 

A complete list of its instruction categories includes . . . 

• Loads and stores between memory and the integer and floating-point registers. Floating-point 
loads can be pipelined in three levels. A pixel store instruction contributes to efficient hidden- 
surface elimination. 

• Transfers between the integer registers and the floating-point registers. 
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• Integer arithmetic for 32-bit signed and unsigned numbers. The 32-bit operations can also 
perform arithmetic on smaller (8- or 16-bit) integers. Arithmetic on large (128-bit or greater) 
integers can be implemented via short software macros or subroutines. (The graphics unit 
provides arithmetic for 64-bit integers.) 

• Shifts of the integer registers. 

• Logical operations on the integer registers. 

• Control transfers. There are both direct and indirect branches, a call instruction, and a branch 
that can be used to form highly efficient loops. Many of these are delayed transfers that avoid 
breaks in the instruction pipeline. One instruction provides efficient loop control by combining 
the testing and updating of the loop index with a delayed control transfer. 

• System control functions. 

1.3 FLOATING-POINT UNIT 

The floating-point unit contains the floating-point register file. This file can be accessed as 8 X 
128-bit registers, 16 x 64-bit registers, or 32 x 32-bit registers. 

The floating-point unit contains both the floating-point adder and the floating-point multiplier. The 
adder performs floating-point addition, subtraction, comparison, and conversions. The multiplier 
performs floating-point and integer multiply and floating-point reciprocal operations. Both units 
support 64- and 32-bit floating-point values in IEEE Standard 754 format. Each of these units 
uses pipelining to deliver up to one result per clock. The adder and multiplier can operate in 
parallel, producing up to two results per clock. Furthermore, the floating-point unit can operate in 
parallel with the core unit, sustaining the two-result-per-clock rate by overlapping administrative 
functions with floating point operations. 

The RISC design philosophy minimizes circuit delays and enables using of all the available chip 
space to achieve the greatest performance for floating-point operations. Due to this fact, due to 
the use of pipelining and parallelism in the floating-point unit, and due to the wide on-chip caches, 
the i860 Microprocessor achieves extremely high levels of floating-point performance. 

The use of RISC design principles implies that the i860 Microprocessor does not have high-level 
math macro- instructions. High-level math (and other) functions are implemented in software 
macros and libraries. For example, the i860 Microprocessor does not have asm instruction. The 
sin function is implemented in software on the i860 Microprocessor. The sin routine for the i860 
Microprocessor, however, will still be very fast due to the extremely high speed of the basic 
floating-point operations. Commonly used math operations, such as the sin function, are offered 
by Intel as part of a software library. 

The floating-point data types, floating-point instructions, and exception handling all support the 
IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985) with both single- 
and double-precision floating-point data types. Due to the low-level instruction set of the i860 
Microprocessor, not all functions defined by the standard are implemented directly by the 
hardware. The i860 Microprocessor supplies the underlying data types, instructions, exception 
checking, and traps to make it possible for software to implement the remaining functions of the 
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standard efficiendy. Intel supplies a software library that provides programs for the i860 
Microprocessor with full IEEE-compatible arithmetic. 

1.4 GRAPHICS UNIT 

The graphics unit has special 64-bit integer logic that supports 3-D graphics drawing algorithms. 
This unit can operate in parallel with the core unit. It contains the special-purpose MERGE 
register, and performs multiple additions on integers stored in the floating-point register file. 

These special graphics features focus the chip's high performance on applications that involve 
three-dimensional graphics with Gouraud or Phong color intensity shading and hidden surface 
elimination via the Z-buffer algorithm. The graphics features of the i860 Microprocessor assume 
that: 

• The surface of a solid object is drawn with polygon patches whose shapes approximate the 
original object. 

• The color intensities of the vertices of the polygon and their distances from the viewer are 
known, but the distances and intensities of the other points must be calculated by interpolation. 

The graphics instructions of the i860 Microprocessor directly aid such interpolation. Furthermore, 
the i860 Microprocessor recognizes the pixel as an 8-, 16-, or 32-bit data type. It can compute 
individual red, blue, and green color intensity values within a pixel; but it does so with parallel 
operations that take advantage of the 64-bit internal word size and 64-bit external data bus. 

The graphics unit also provides add and subtract operations for 64-bit integers, which are 
especially useful for high-resolution distance interpolation. 

In addition to the special support provided by the graphics unit, many 3-D graphics applications 
directly benefit from the parallelism of the core and floating-point units. For example, the 3-D 
rotation represented in homogeneous vector notation by . . . 



[X Y Z 1] = [x y z 1] 



1 














COS t 


sin / 








— sin t 


cos t 














1 



... is just one example of the kind of vector-oriented calculation that can be converted to a 
program that takes full advantage of the pipelining, dual- instruction mode, dual operations, and 
memory hierarchy of the i860 Microprocessor. 
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1.5 MEMORY MANAGEMENT UNIT 

The on-chip MMU of the i860 Microprocessor performs the translation of addresses from the 
linear logical address space to the linear physical address for both data and instruction access. 
Address translation is optional; when enabled, address translation uses a two-level structure of 
page directories and page tables of IK entries each. Information from these tables is cached in a 
64-entry, four-way set-associative memory. The i860 Microprocessor provides basic features (bits 
and traps) to implement paged virtual memory and to implement user/supervisor protection at the 
page level — all compatible with the paged memory management of the 386^*^ and 486^"^ 
microprocessors. 

1.6 CACHES 

In addition to the page translation cache mentioned previously, the i860 Microprocessor contains 
separate on-chip caches for data and instructions. Caching is transparent, except to systems 
programmers who must ensure that the data cache is flushed when switching tasks or changing 
system memory parameters. The on-chip cache controller also provides the interface to the 
external bus with a pipelined structure that allows up to three outstanding bus cycles. 

The instruction cache is a two-way, set-associative memory of four Kbytes, with 32-byte blocks. 
The data cache is a write-back cache, composed of a two-way, set-associative memory of eight 
Kbytes, with 32-byte blocks. 

1.7 PARALLEL ARCHITECTURE 

The i860 Microprocessor offers a high level of parallelism in a form that is flexible enough be 
applied to a wide variety of processing styles: 

• Conventional programs and conventional compilers can use the i860 Microprocessor as a 
scalar machine and still benefit from the high-performance of the i860 Microprocessor. 

• Compilers designed for the vector model can treat the i860 Microprocessor as a vector 
machine. 

• New instruction-scheduling technology for compilers can compare the processing requirements 
and data dependencies of programs with the available resources of the i860 Microprocessor, 
and can take maximum advantage of its dual-instruction mode, pipelining, and caching. 

An established compiler technology for the vector model of computation already exists. This 
technology can be applied directly to the i860 Microprocessor. The key to treating the i860 
Microprocessor as a vector machine is choosing the appropriate vector primitives that the compiler 
assumes are available on the target machine. (Intel has defined a standard set of vector primitives.) 
The vector primitives are implemented as hand-coded subroutines; the compiler generates calls to 
these subroutines. If a compiler depends on the traditional concept of vector registers, it can 
implement them by mapping these registers to specific memory addresses. By virtue of frequent 
access to these addresses, the simulated registers will reside permanently in the data cache. 
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Existing programs can be upgraded to take better advantage of the parallel architecture of the i860 
Microprocessor using vector-oriented technology. Flow analysis or "vectorizing" tools can 
identify parallelism that is implicit in existing programs. When modified (either manually or 
automatically) and compiled by an appropriate compiler for the i860 Microprocessor, these 
programs can achieve even greater performance gain from the i860 Microprocessor. 

Designers of compilers for the i860 Microprocessor will find that the i860 Microprocessor offers 
more flexibility than traditional vector processing. The instruction set of the i860 Microprocessor 
separates addressing functions from arithmetic functions. Two benefits result from this separation: 

1. It is possible to address arbitrary data structures. Data structures are no longer limited to 
vectors, arrays, and matrices. Parallel algorithms can be applied to linked lists (for example) 
as easily as to matrices. 

2. A richer set of operations is available at each node of a data structure. It becomes possible to 
perform different operations at each node, and there is no limit to the complexity of each 
operation. With the i860 Microprocessor, it is no longer necessary to pass all elements of a 
vector several times to implement complex vector operations. 

1.8 SOFTWARE DEVELOPMENT ENVIRONMENT 

The software environment available from Intel for the i860 Microprocessor includes: 

• Assembler, linker, C, and FORTRAN compilers, and FORTRAN vectorizer. 

• Libraries of higher-level math functions and IEEE-standard exception support. Intel supplies 
such libraries in a form that can be utilized by a variety of compilers. 

• Simulator and debugger. 

1.8.1 Multiprocessing for High-Performance with Compatibility 

Memory organization of the i860 Microprocessor is compatible with that of the 386™ and 486^^^ 
microprocessors (including addresses and page-table entries); all data types are compatible as well 
(both integers and floating-point numbers). The page-oriented virtual memory management of the 
i860 Microprocessor is also compatible with that of the 386 and 486 microprocessors. This level 
of compatibility facilitates use of the i860 Microprocessor in multiprocessor systems with a 386 
or 486 microprocessor. Moreover, complete hardware and software support for such multiprocessor 
systems is available. 

An i860 microprocessor can be used with a 386^^, 386SX^'^, or 486™ microprocessor system. 
The i860 microprocessor extends system performance to supercomputer levels, while the 386/ 
386SX/486 microprocessor provides binary compatibility with existing applications. The compat- 
ibility processor provides access to a huge software base supporting a wide variety of I/O devices, 
communications protocols, and human-interface methods. The computation-intensive applications 
enjoy the raw computational power of the i860 Microprocessor, while having access to all 
capabilities and resources of the compatibility processor. 
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The i860 Microprocessor provides operations for integer and floating-point data. Integer operations 
are performed on 32-bit operands with some support also for 64-bit operands. Load and store 
instructions can reference 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit operands. Floating-point 
operations are performed on IEEE-standard 32- and 64-bit formats. Graphics oriented instructions 
operate on arrays of 8-, 16-, or 32-bit pixels. 

Bits within data formats are numbered from zero starting with the least significant bit. Illustrations 
of data formats in this manual show the least significant bit (bit zero) at the right. 

2.1 INTEGER 

An integer is a 32-bit signed value in standard two's complement form. A 32-bit integer can 
represent a value in the range -2,147,483,648 (-2^^) to 2,147,438,647 (+2^^ - 1). Arithmetic 
operations on 8- and 16-bit integers can be performed by sign-extending the 8- or 16-bit values to 
32 bits, then using the 32-bit operations. 

There are also add and subtract instructions that operate on 64-bit long integers. 

Load and store instructions may also reference (in addition to the 32- and 64-bit formats previously 
mentioned) eight- and 16-bit items in memory. When an eight- or 16-bit item is loaded into a 
register, it is converted to an integer by sign-extending the value to 32 bits. When an eight- or 
16-bit item is stored from a register, the corresponding number of low-order bits of the register 
are used. 



2.2 ORDINAL 

Arithmetic operations are available for 32-bit ordinals. An ordinal is an unsigned integer. An 
ordinal can represent values in the range to 4,294,967,295 (+2^^ — 1). 

Also, there are add and subtract instructions that operate on 64-bit ordinals. 

2.3 SINGLE-PRECISION REAL 

A single-precision real (also called "single real") data type is a 32-bit binary floating-point 
number. Bit 31 is the sign bit; bits 30.. 23 are the exponent; and bits 22.. are the fraction. In 
accordance with ANSI/IEEE standard 754, the value of a single-precision real is defined as 
follows: 

1 . If e = and f 9^ or e = 255 then generate a floating-point source-exception trap when 
encountered in a floating-point operation. 
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2. If < e < 255, then the value is - 1^ X l.f X 2^~127 (j^e exponent adjustment 127 is 
called the bias.) 

3. If e = and f = 0, then the value is signed zero. 

The special values infinity, NaN, indefinite, and denormal generate a trap when encountered. The 
trap handler implements IEEE-standard results. (Refer to Table 2-2 for encoding of these special 
values.) 



2.4 DOUBLE-PRECISION REAL 
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A double-precision real (also called "double real") data type is a 64-bit binary floating-point 
number. Bit 63 is the sign bit; bits 62.. 52 are the exponent; and bits 51..0 are the fraction. In 
accordance with ANSI/IEEE standard 754, the value of a double-precision real is defined as 
follows: 

1. If e = and f ^t or e = 2047, then generate a floating-point source-exception trap when 
encountered in a floating-point operation. 



2-2 



inteT 



DATA TYPES 



2. If < e < 2047, then the value is - 1 ^ X 1 .f X 2^" ^^23 (x^g exponent adjustment 1023 is 
called the bias.) 

3. If e = and f = 0, then the value is signed zero. 

The special values infinity, NaN, indefinite, and denormal generate a trap when encountered. The 
trap handler implements IEEE-standard results. (Refer to Table 2-2 for encoding of these special 
values.) 

A double real value occupies an even/odd pair of floating-point registers. Bits 31..0 are stored in 
the even-numbered floating-point register; bits 63.. 32 are stored in the next higher odd-numbered 
floating-point register. 

2.5 PIXEL 

A pixel may be 8, 16, or 32 bits long depending on color and intensity resolution requirements. 
Regardless of the pixel size, the i860 Microprocessor always operates on 64 bits worth of pixels 
at a time. The pixel data type is used by two kinds of instructions: 

• The selective pixel-store instruction that helps implement hidden surface elimination. 

• The pixel add instruction that helps implement 3-D color intensity shading. 

To perform color intensity shading efficientiy in a variety of applications, the i860 Microprocessor 
defines three pixel formats according to Table 2-1. 



Table 2-1. Pixel Formats 



Pixel 
Size 
(in bits) 



Bits of 
Color 1 * 
Intensity 



Bits of 
Color 2* 
Intensity 



Bits of 
Color 3* 
Intensity 



Bits of 
Other 
Attribute 
(Texture) 



8 

16 
32 



N (^8) bits of intensity** 
6 6 4 

8 8 8 



8- N 



* The intensity attribute fields may be assigned to colors in any order convenient to the 
application. 

** With 8-bit pixels, up to 8 bits can be used for intensity; the remaining bits can be used for any 
other attribute, such as color. The intensity bits must be the low-order bits of the pixel. 

Figure 2-1 illustrates one way of assigning meaning to the fields of pixels. These assignments are 
for illustration purposes only. The i860 Microprocessor defines only the field sizes, not the specific 
use of each field. Other ways of using the fields of pixels are possible. 
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16-BIT PIXEL 
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31 



23 



15 



32-BIT PIXEL 



R 


G 


B 


T 



I— INTENSITY, R— RED INTENSITY, G— GREEN INTENSITY, B— BLUE INTENSITY, C— COLOR, 
T— TEXTURE 

THESE ASSIGNMENTS OF SPECIFIC MEANINGS TO THE FIELDS OF PIXELS ARE FOR 
ILLUSTRATION PURPOSES ONLY. ONLY THE FIELD SIZES ARE DEFINED, NOT THE SPECIFIC 
USE OF EACH FIELD. 



Figure 2-1. Pixel Format Example 

2.6 REAL-NUMBER ENCODING 

Table 2-2 presents the complete range of values that can be stored in the single and double real 
formats. Not all possible values are directiy supported by the i860 Microprocessor. The supported 
values are the normals and the zeros, both positive and negative. Other values are not generated 
by the i860 Microprocessor, and, if encountered as input to a floating-point instruction, they 
trigger the floating-point source exception. Exception-handling softwai'e can use the unsupported 
values to implement denormals, infinities, and NaNs. 
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rable 2-2. Single and Double Real Encodings 
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Single: 


< 8blts> 


<- 23 bits -> 








Double: 


< 1 1 bits> 


<- 52 bits -> 



■ Integer bit is implied and not stored 
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As Figure 3-1 shows, the i860 Microprocessor has the following registers: 

• An integer register file 

• A floating-point register file 

• Six control registers (psr, epsr, db, dirbase, fir, and fsr) 

• Four special-purpose registers (KR, KI, T, and MERGE) 

The control registers are accessible only by load and store control-register instructions; the integer 
and floating-point registers are accessed by arithmetic operations and load and store instructions. 
The special-purpose registers KR, KI, T, and MERGE are used by a few specific instructions. 
For information about initialization of registers, refer to the reset trap in Chapter 7. For information 
about protection as it applies to registers, refer to the st.c instruction in Chapter 5. 

3.1 INTEGER REGISTER FILE 

There are 32 integer registers, each 32-bits wide, referred to as rO through r31 , which are used for 
address computation and scalar integer computations. Register rO always returns zero when read, 
independently of what is stored in it. This special behaviour of rO makes it useful for modifying 
the function of certain instructions. For example, specifying rO as the destination of a subtract 
(thereby effectively discarding the result) produces a compare instruction. Similarly, using rO as 
one source operand of an OR instruction produces a test- for- zero instruction. 

3.2 FLOATING-POINT REGISTER FILE 

There are 32 floating-point registers, each 32-bits wide, referred to as fO through f31 , which are 
used for floating-point computations. Registers fO and f1 always return zero when read, 
independently of what is stored in them. The floating-point registers are also used by a set of 
integer operations, primarily for graphics computations. 

The floating-point registers act as buffer registers in vector computations, while the data cache 
performs the role of the vector registers of a conventional vector processor. 

When accessing 64-bit floating-point or integer values, the i860 Microprocessor uses an even/odd 
pair of registers. When accessing 128-bit values, it uses an aligned set of four registers (fO, f4, 
f8, ... , f30). The instruction must designate the lowest register number of the set of registers 
containing 64- or 128-bit values. Misaligned register numbers produce undefined results. The 
register with the lowest number contains the least significant part of the value. For 128-bit values, 
the register pair with the lower number contains the 64 bits at the lowest memory address; the 
register pair with the higher number contains the 64 bits at the highest address. 
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Figure 3-1. Register Set 

3.3 PROCESSOR STATUS REGISTER 

The processor status register (psr) contains miscellaneous state information for the current process. 
Figure 3-2 shows the format of the psr. Fields marked by an asterisk in the figure can be changed 
only in supervisor mode. 

• BR (Break Read) and BW (Break Write) enable a data access trap when the operand address 
matches the address in the db register and a read or write (respectively) occurs. (Refer to 
section 3.5 for more about the db register.) 

• Various instructions set CC (Condition Code) according to tests they perform, as explained 
in Chapter 5. The conditional branch instructions test its value. The bla instruction described 
in Chapter 5 sets and tests LCC (Loop Condition Code). 
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** *** *** *** * 



t t ft 



KILL NEXT FLOATING-POINT 

INSTRUCTION 
(RESERVED) 
SHIFT COUNT 
PIXEL SIZE 
PIXEL MASK 



*CAN BE CHANGED ONLY FROM SUPERVISOR LEVEL. 



Figure 3-2. Processor Status Register 

IM (Interrupt Mode) enables external interrupts if set; disables interrupts if clear. (Chapter 7 
covers interrupts.) 

U (User Mode) is set when the i860 Microprocessor is executing in user mode; it is clear 
when the i860 Microprocessor is executing in supervisor mode. In user mode, writes to some 
control registers are inhibited. This bit also controls the memory protection mechanism 
described in Chapter 4. 

PIM (Previous Interrupt Mode) and PU (Previous User Mode) save the conesponding status 
bits (IM and U) on a trap, because those status bits are changed when a trap occurs. They are 
restored into their corresponding status bits when returning from a trap handler with a branch 
indirect instruction when a trap flag is set in the psr. (Chapter 7 provides the details about 
traps.) 

FT (Floating-Point Trap), DAT (Data Access Trap), I AT (Instruction Access Trap), IN 
(Interrupt), and IT (Instruction Trap) are trap flags. They are set when the corresponding trap 
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condition occurs. The trap handler examines these bits to determine which condition or 
conditions have caused the trap. Refer to Chapter 7 for a more detailed explanation. 

• DS (Delayed Switch) is set if a trap occurs during the instruction before dual-instruction 
mode is entered or exited. If DS is set and DIM (Dual Instruction Mode) is clear, the i860 
Microprocessor switches to dual-instruction mode one instruction after returning from the trap 
handler. If DS and DIM are both set, the i860 Microprocessor switches to single-instruction 
mode one instruction after returning from the trap handler. Chapter 7 explains how trap 
handlers use these bits. 

• When a trap occurs, the i860 Microprocessor sets DIM if it is executing in dual-instruction 
mode; it clears if it is executing in single-instruction mode. If DIM is set, the i860 
Microprocessor resumes execution in dual-instruction mode after returning from the trap 
handler. 

• When KNF (Kill Next Floating-Point Instruction) is set, the next floating-point instruction is 
suppressed (except that its dual-instruction mode bit is interpreted). A trap handler sets KNF 
if the trapped floating-point instruction should not be reexecuted. KNF is especially useful for 
returning from a trap that occurred in dual-instruction mode, because it permits the core 
instruction to be executed while the floating-point instruction is suppressed. KNF is 
automatically reset by the i860 Microprocessor when the instruction has been successfully 
bypassed. It is possible that the core instruction may cause a trap when the floating-point 
instruction is suppressed. In this case KNF remains set, permitting retry of the core instruction. 

• SC (Shift Count) stores the shift count used by the last right-shift instruction. It controls the 
number of shifts executed by the double-shift instruction, as described in Chapter 5. 

• PS (Pixel Size) and PM (Pixel Mask) are used by the pixel-store instruction described in 
Chapter 5 and by the graphics instructions described in Chapter 6. The values of PS control 
pixel size as defined by Table 3-1 . The bits in PM correspond to pixels to be updated by the 
pixel-store instruction pst.d. The low-order bit of PM corresponds to the low-order pixel of 
the 64-bit source operand of pst.d . The number of low-order bits of PM that are actually 
used is the number of pixels that fit into 64-bits, which depends upon PS. If a bit of PM is 
set, then pst.d stores the corresponding pixel. 

Table 3-1. Values of PS 



Value 


Pixel Size 
in bits 


Pixel Size 
in bytes 


GO 
01 


8 
16 


1 
2 


10 
11 


32 
(undefined) 


4 
(undefined) 
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3.4 EXTENDED PROCESSOR STATUS REGISTER 

The extended processor status register (epsr) contains additional state information for the current 
process beyond that stored in the psr. Figure 3-3 shows the format of the epsr. Fields marked by 
an asterisk in the figure can be changed only in supervisor mode. 



INTERLOCK • 

WRITE-PROTECT MODE 

(RESERVED) 

INTERRUPT 



DATA CACHE SIZE 



V V V f 
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DCS 



X X 



lw| 

p 



STEPPING 
NUMBER 



PROCESSOR 
TYPE 



a a a 



PAGE-TABLE BIT MODE 
BIG ENDIAN MODE 
OVERFLOW FLAG 



*CAN BE CHANGED ONLY FROM SUPERVISOR LEVEL. 



Figure 3-3. Extended Processor Status Register 

The processor type is one for the i860 Microprocessor. 

The stepping number has a unique value that distinguishes among different revisions of the 
processor. 

IL (Interlock) is set if a trap occurs after a lock instruction but before the load or store 
following the subsequent unlock instruction. IL indicates to the trap handler that a locked 
sequence has been interrupted. 

WP (Write Protect) controls the semantics of the W bit of page table entries. A clear W bit 
in either the directory or the page table entry causes writes to be trapped. When WP is clear, 
writes are trapped in user mode, but not in supervisor mode. When WP is set, writes are 
trapped in both user and supervisor modes. 

INT (Interrupt) is the value of the INT input pin. 

DCS (Data Cache Size) is a read-only field that tells the size of the on-chip data cache. The 
number of bytes actually available is 2 '2+ DCS- therefore, a value of zero indicates 4 Kbytes, 
one indicates 8 Kbytes, etc. 

PBM (Page-Table Bit Mode) determines which bit of page-table entries is output on the PTB 
pin. When PBM is clear, the PTB signal reflects bit CD of the page-table entry used for the 
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current cycle. When PBM is set, the PTB signal reflects bit WT of the page-table entry used 
for the current cycle. 

• BE (Big Endian) controls the ordering of bytes within a data item in memory. Normally (i.e. 
when BE is clear) the i860 Microprocessor operates in little endian mode, in which the 
addressed byte is the low-order byte. When BE is set (big endian mode), the low-order three 
bits of all load and store addresses are complemented, then masked to the appropriate 
boundary for alignment. This causes the addressed byte to be the most significant byte. Refer 
to Chapter 4 for more endian information. 

• OF (Overflow Flag) is set by adds, addu, subs, andsubu when integer overflow occurs. 
For adds and subs, OF is set if the carry from bit 31 is different than the carry from bit 30. 
For addu, OF is set if there is a carry from bit 31. Forsubu, OF is set if there is no carry 
from bit 31. Under all other conditions, it is cleared by these instructions. OF controls the 
function of the intovr instruction (refer to Chapter 5). 

3.5 DATA BREAKPOINT REGISTER 

The data breakpoint register (db) is used to generate a trap when the i860 Microprocessor accesses 
an operand at the address stored in this register. The trap is enabled by BR and BW in psr. When 
comparing, a number of low order bits of the address are ignored, depending on the size of the 
operand. For example, a 16-bit access ignores the low -order bit of the address when comparing 
todb; a 32-bit access ignores the low-order two bits. This ensures that any access that overlaps 
the address contained in the register will generate a trap. 

3.6 DIRECTORY BASE REGISTER 

The directory base register dirbase (shown in Figure 3-4) controls address translation, caching, 
and bus options. 



ADDRESS TRANSLATION ENABLE 

DRAM PAGE SIZE 

BUS LOCK 



1-CACHE, TLB INVALIDATE 
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REPLACEMENT BLOCK — 
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Figure 3-4. Directoi7 Base Register 
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• ATE (Address Translation Enable), when set, enables the virtual-address translation algorithm 
described in Chapter 4. The data cache must be flushed before changing the ATE bit. 

• DPS (DRAM Page Size) controls how many bits to ignore when comparing the current bus- 
cycle address with the previous bus-cycle address to generate the NENE# signal. This feature 
allows for higher speeds when using static column or page-mode DRAMs and consecutive 
reads and writes access the same column or page. The comparison ignores the low-order 12 
+ DPS bits. A value of zero is appropriate for one bank of 256KXn RAMs, 1 for 1MX« 
RAMS, etc. 

• When BL (Bus Lock) is set, external bus accesses are locked. The LOCK# signal is asserted 
the next bus cycle whose internal bus request is generated after BL is set. It remains set on 
every subsequent bus cycle as long as BL remains set. The LOCK# signal is deasserted on 
the next bus cycle whose internal bus request is generated after BL is cleared. Traps 
immediately clear BL and the LOCK# signal and set IL inepsr. In this case the trap handler 
should resume execution at the beginning of the locked sequence. The lock and unlock 
instructions control the BL bit (refer to Chapter 5). 

• ITI (Instruction-Cache, TLB Invalidate), when set in the value that is loaded into dirbase, 
causes the instruction cache and address-translation cache (TLB) to be flushed. The ITI bit 
does not remain set in dirbase. ITI always appears as zero when read from dirbase. The 
data cache must be flushed before invalidating the TLB. 

• When CSS (Code Size 8-Bit) is set, instruction cache misses are processed as 8-bit bus 
cycles. When this bit is clear, instruction cache misses are processed as 64-bit bus cycles. 
This bit can not be set by software; hardware sets this bit at initialization time. It can be 
cleared by software (one time only) to allow the system to execute out of 64-bit memory after 
bootstrapping from 8-bit EPROM. A nondelayed branch to code in 64-bit memory should 
directly follow thest.c instruction that clears CSS, in order to make the transition from 8-bit 
to 64-bit memory occur at the correct time. The branch must be aligned on a 64-bit boundary. 
Refer to the CSS mode in the i860 Hardware Reference Manual for more information. 

• RB (Replacement Block) identifies the cache block to be replaced by cache replacement 
algorithms. The high-order bit of RB is ignored by the instruction and data caches. RB 
conditions the cache flush instruction flush, which is discussed in Chapter 5. Table 3-2 
explains the values of RB. 



Table 3-2. Values of RB 



Value 


Replace 
TLB Block 


Replace Instruction 
and Data Cache Block 


00 

1 

1 
1 1 



1 
2 
3 




1 


1 
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• RC (Replacement Control) controls cache replacement algorithms. Table 3-3 explains the 
significance of the values of RC. The use of the RC and RB to implement data cache flushing 
is described in Chapter 4. 

• DTB (Directory Table Base) contains the high-order 20 bits of the physical addess of the 
page directory when address translation is enabled (i.e. ATE = 1). The low-order 12 bits of 
the address are zeros (therefore the directory must be located on a 4K boundary). 





Table 3-3. Values of RC 


Value 


Meaning 


00 


Selects the normal replacement algorithm where any block in the set may be 
replaced on cache misses in all caches. 


01 


Instruction, data, and TLB cache misses replace the block selected by RB. The 
instruction and data caches ignore the high-order bit of RB. This mode is used 
for instruction cache and TLB testing. 


10 


Data cache misses replace the block selected by the low-order bit of RB. 


11 


Disables data cache replacement. 



3.7 FAULT INSTRUCTION REGISTER 

When a trap occurs, this register (the fir) contains the address of the instruction that caused the 
trap, as described in Chapter 7. Saving fir anytime except the first time after a trap occurs saves 
the address of the Id.c instruction. 



3.8 FLOATING-POINT STATUS REGISTER 

The floating-point status register (fsr) contains the floating-point trap and rounding-mode status 
for the current process. Figure 3-5 shows its format. 

• If FZ (Flush Zero) is clear and underflow occurs, a result-exception trap is generated. When 
FZ is set and underflow occurs, the result is set to zero, and no trap due to underflow occurs. 

• If TI (Trap Inexact) is clear, inexact results do not cause a trap. If TI is set, inexact results 
cause a trap. The sticky inexact flag (SI) is set whenever an inexact result is produced, 
regardless of the setting of TI. 

• RM (Rounding Mode) specifies one of the four rounding modes defined by the IEEE standard. 
Given a true result b that cannot be represented by the target data type, the i860 
Microprocessor determines the two representable numbers a and c that most closely bracket 
b in value (a < b < c). The i860 Microprocessor then rounds (changes) Z? to a or c according 
to the mode selected by RM as defined in Table 3-4. Rounding introduces an error in the 
result that is less than one least-significant bit. 
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FLUSH ZERO 

TRAP INEXACT - 
ROUNDING MODE 
UPDATE 



FLOATING-POINT TRAP ENABLE 

(RESERVED) 

STICKY INEXACT FLAG 

SOURCE EXCEPTION 

MULTIPLIER UNDERFLOW 

MULTIPLIER OVERFLOW 

MULTIPLIER INEXACT 

MULTIPLIER ADD ONE 

ADDER UNDERFLOW 

ADDER OVERFLOW 



T] 



31 



28 



25 



22 



17 15 
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1 

R 
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A 
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A 
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E 


S 

1 


X 


F 
T 
E 


U 


RM 


T 
1 


F 
Z 



Oz 



ADDER INEXACT 

ADDER ADD ONE 

RESULT REGISTER 

ADDER EXPONENT 

(RESERVED) 

LOAD PIPE RESULT PRECISION 

INTEGER (GRAPHICS) PIPE RESULT 

PRECISION 
MULTIPLIER PIPE RESULT PRECISION 
ADDER PIPE RESULT PRECISION 
(RESERVED) 



Figure 3-5. Floating-Point Status Register 





Table 3-4. Values of RM 


Value 


Rounding Mode 


Rounding Action 


00 


Round to nearest or even 


Closer to /) of a or c; if equally close, select even 
number (the one whose least significant bit is 
zero). 


01 


Round down (toward -oo) 


a 


10 


Round up (toward +0°) 


c 


11 


Chop (toward zero) 


Smaller in magnitude of a ore. 



The U-bit (Update Bit), if set in the value that is loaded intofsr by ast.c instruction, enables 
updating of the result-status bits (AE, AA, AI, AO, AU, MA, MI, MO, and MU) in the 
first-stage of the floating-point adder and multiplier pipelines. If this bit is clear, the result- 
status bits are unaffected by a st.c instruction; st.c ignores the corresponding bits in the value 
that is being loaded. Anst.c always updates fsr bits 21.. 17 and 8..0 directly. The U-bit does 
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• 



• 



• 



not remain set; it always appears a zero when read. A trap handler that has interrupted a 
pipelined operation sets the U-bit to enable restoration of the result-status bits in the pipeline. 
Refer to Chapter 7 for details. 

The FTE (Floating-Point Trap Enable) bit, if clear, disables all floating-point traps (invalid 
input operand, overflow, underflow, and inexact result). Trap handlers clear it while saving 
and restoring the floating-point pipeline state (refer to Chapter 7) and to produce NaN, 
infinite, or denormal results without generating traps. 

SI (Sticky Inexact) is set when the last-stage result of either the multiplier or adder is inexact 
(i.e. when either AI or MI is set). SI is "sticky" in the sense that it remains set until reset 
by software. AI and MI, on the other hand, can by changed by the subsequent floating-point 
instruction. 

SB (Source Exception) is set when one of the source operands of a floating-point operation is 
invalid; it is cleared when all the input operands are valid. Invalid input operands include 
denormals, infinities, and all NaNs (both quiet and signaling). Trap handler software can 
implement IEEE-standard results for operations on these values. 

When read from thefsr, the result-status bits MA, MI, MO, and MU (Multiplier Add-One, 
Inexact, Overflow, and Underflow, respectively) describe the last-stage result of the multiplier. 

When read from thefsr, the result-status bits AA, AI, AO, AU, and AE (Adder Add-One, 
Inexact, Overflow, Underflow, and Exponent, respectively) describe the last-stage result of 
the adder. The high-order three bits of the 11 -bit exponent of the adder result are stored in 
the AE field. The trap handler needs the AE bits when overflow or underflow occurs with 
double-precision inputs and single-precision outputs. 

After a floating-point operation in a given unit (adder or multiplier), the result-status bits of 
that unit are undefined until the point at which result exceptions are reported. 

When written to the fsr with the U-bit set, the result-status bits are placed into the first stage 
of the adder and multiplier pipelines. When the processor executes pipelined operations, it 
propagates the result-status bits of a particular unit (multiplier or adder) one stage for each 
pipelined floating-point operation for that unit. When they reach the last stage, they replace 
the normal result-status bits in thefsr. 

In a floating-point dual-operation instruction (e.g. add-and-multiply or subtract- and-multiply), 
both the multiplier and the adder may set exception bits. The result-status bits for a particular 
unit remain set until the next operation that uses that unit. 

A A (Adder Add One), if set, indicates that the adder rounded the result by adding one least 
significant bit. 

MA (Multiplier Add One), if set, indicates the multiplier rounded the result by one least 
significant bit. 

RR (Result Register) specifies which floating-point register (f0-f31) was the destination register 
when a result-exception trap occurs due to a scalar operation. 
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• LRP (Load Pipe Result Precision), IRP (Integer (Grapiiics) Pipe Result Precision), MRP 
(Multiplier Pipe Result Precision), and ARP (Adder Pipe Result Precision) aid in restoring 
pipeline state after a trap or process switch. Each defines the precision of the last-stage result 
in the corresponding pipeline. One of these bits is set when the result in the last stage of the 
corresponding pipeline is double precision; it is cleared if the result is single precision. These 
bits cannot be changed by software. 

3.9 KR, Kl, T, AND MERGE REGISTERS 

The KR and KI ("Konstant") registers and the T (Temporary) register are special-purpose 
registers used by the dual-operation floating-point instructions described in Chapter 6. The 
MERGE register is used only by the graphics instructions also presented in Chapter 6. Refer to 
this chapter for details of their use. 
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Chapter 4 
Addressing 



Memory is addressed in byte units with a paged virtual-address space of 2^^ bytes. Data and 
instructions can be located anywhere in this address space. Address arithmetic is performed using 
32-bit input values and produces 32-bit results. The low-order 32 bits of the result are used in 
case of overflow. 

Normally, multibyte data values are stored in memory in litde endian format, i.e. with the least 
significant byte at the lowest memory address. As an option that may be dynamically selected by 
software in supervisor mode, the i860 Microprocessor also offers big endian mode, in which the 
most significant byte of a data item is at the lowest address. Code accesses are always done with 
little endian addressing. Figure 4-1 shows the difference between the two storage modes. Big 
endian and little endian data areas should not be mixed within a 64-bit data word. Illustrations of 
data structures in this manual show data stored in little endian mode, i.e. the rightmost (low- 
order) byte is at the lowest memory address. The BE bit of epsr selects the mode, as described 
in Chapter 3. 





LITTLE ENDIAN FORMAT 
63 55 47 39 31 23 15 7 


























m+7 m+6 m+5 m+4 m+3 m+2 m+1 m 

BIG ENDIAN FORMAT 
63 55 47 39 31 23 15 7 


























m m+1 m+2 m+3 m+4 m+5 m+6 m+7 
m IS THE MEMORY ADDRESS OF THE WORD. 





Figure 4-1. Memory Formats 
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4.1 ALIGNMENT 

All data types are addressed by specifying their lowest- addressed byte. Alignment requirements 
are as follows: 

• A 128-bit value is aligned to an address divisible by 16 when referenced in memory (i.e. the 
four least significant address bits must be zero) or a data-access trap occurs. 

• A 64-bit value is aligned to an address divisible by eight when referenced in memory (i.e. 
the three least significant address bits must be zero) or a data-access trap occurs. 

• A 32-bit value is aligned to an address divisible by four when referenced in memory (i.e. the 
two least significant address bits must be zero) or a data-access trap occurs. 

• A 16-bit value is aligned to an address divisible by two when referenced in memory (i.e. the 
least significant address bit must be zero) or a data-access trap occurs. 

4.2 VIRTUAL ADDRESSING 

When address translation is enabled, the i860 Microprocessor maps instruction and data virtual 
addresses into physical addresses before referencing memory. This address transformation is 
compatible with that of the 386™ microprocessor and implements the basic features needed for 
page-oriented virtual-memory systems and page-level protection. 

The address translation is optional. Address translation is in effect only when the ATE bit of 
dirbase is set. This bit is typically set by the operating system during software initialization. The 
ATE bit must be set if the operating system is to implement page-oriented protection or page- 
oriented virtual memory. 

Address translation is disabled when the processor is reset. It is enabled when a store to dirbase 
sets the ATE bit. It is disabled again when a store clears the ATE bit. 

4.2.1 Page Frame 

A page frame is a 4K-byte unit of contiguous addresses of physical main memory. Page frames 
begin on 4K-byte boundaries and are fixed in size. A page is a the collection of data that occupies 
a page frame when that data is present in main memory or occupies some location in secondary 
storage when there is not sufficient space in main memory. 

4.2.2 Virtual Address 

A virtual address refers indirectly to a physical address by specifying a page table, a page within 
that table, and an offset within that page. Figure 4-2 shows the format of a virtual address. 
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DIR 
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Figure 4-2. Format of a Virtual Address 



Figure 4-3 shows how the i860 Microprocessor converts the DIR, PAGE, and OFFSET fields of 
a virtual address into the physical address by consulting two levels of page tables. The addressing 
mechanism uses the DIR field as an index into a page directory, uses the PAGE field as an index 
into the page table determined by the page directory, and uses the OFFSET field to address a byte 
within the page determined by the page table. 
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Figure 4-3. Address Translation 
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4.2.3 Page Tables 

A page table is simply an array of 32-bit page specifiers. A page table is itself a page, 
therefore contains 4 Kilobytes of memory or at most IK 32-bit entries. 



and 



Two levels of tables are used to address a page of memory. At the higher level is a page directory. 
The page directory addresses up to IK page tables of the second level. A page table of the second 
level addresses up to IK pages. All the tables addressed by one page directory, therefore, can 
address IM pages (2^0). Because each page contains 4Kbytes (2^^ bytes), the tables of one page 
directory can span the entire physical address space of the i860 Microprocessor (2^^ x 2'^ = 
232). 

The physical address of the current page directory is stored in DTB field of the dirbase register. 
Memory management software has the option of using one page directory for all processes, one 
page directory for each process, or some combination of the two. 

4.2.4 Page-Table Entries 

Page-table entries (PTEs) in either level of page tables have the same format. Figure 4-4 illustrates 
this format. 



PRESENT 
WRITABLE 
USER 



WRITE-THRCXJGH 

CACHE DISABLE 

ACCESSED 

DIRTY 

(RESERVED) 

AVAILABLE FOR SYSTEMS PRCXSRAMMER USE 



31 



T T fTTTfT^' 



12 



PAGE FRAME ADDRESS 31 ..12 



AVAIL 



X X 



C W 



U W 



NOTE: X INDICATES INTEL RESERVED. DO NOT USE. 



Figure 4-4. Format of a Page Table Entry 

4.2.4.1 PAGE FRAME ADDRESS 

The page frame address specifies the physical starting address of a page. Because pages are 
located on 4K boundaries, the low-order 12 bits are always zero. In a page directory, the page 
frame address is the address of a page table. In a second-level page table, the page frame address 
is the address of the page frame that contains the desired memory operand. 
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4.2.4.2 PRESENT BIT 

The P (present) bit indicates whether a page table entry can be used in address translation. P= 1 
indicates that the entry can be used. 

When P=0 in either level of page tables, the entry is not valid for address translation, and the 
rest of the entry is available for software use; none of the other bits in the entry is tested by the 
hardware. Figure 4-5 illustrates the format of a page-table entry when P=0. 





31 






AVAILABLE 














Figure 4-5. Invalid Page Table Entry 



If P=0 in either level of page tables when an attempt is made to use a page-table entry for address 
translation, the processor signals either a data-access fault or an instruction-access fault. In 
software systems that support paged virtual memory, the trap handler can bring the required page 
into physical memory. Refer to Chapter 7 for more information on trap handlers. 

Note that there is no P bit for the page directory itself. The page directory may be not-present 
while the associated process is suspended, but the operating system must ensure that the page 
directory indicated by the dirbase image associated with the process is present in physical memory 
before the process is dispatched. 

4.2.4.3 CACHE DISABLE BIT 

If the CD (cache disable) bit in the second-level page-table entry is set, data from the associated 
page is not placed in instruction or data caches. The CD bit of page directory entries is not 
referenced by the processor, but is reserved. 

4.2.4.4 WRITE-THROUGH BIT 

The i860 Microprocessor does not implement a write-through caching policy for the on-chip 
instruction and data caches; however, the WT (write-through) bit in the second-level page-table 
entry does determine internal caching policy. If WT is set in a PTE, on-chip caching from the 
corresponding page is inhibited. If WT is clear, the normal write-back policy is applied to data 
from the page in the on-chip caches. The WT bit of page directory entries is not referenced by 
the processor, but is reserved. 

To control external caches, the chip outputs on its PTB pin either CD or WT. The PBM bit of 
epsr determines which bit is output, as described in Chapter 3. 
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4.2.4.5 ACCESSED AND DIRTY BITS 

The A (accessed) and D (dirty) bits provide data about page usage in both levels of the page 
tables. 

The i860 Microprocessor sets the corresponding accessed bits in both levels of page tables before 
a read or write operation to a page. The processor tests the dirty bit in the second-level page table 
before a write to an address covered by that page table entry, and, under certain conditions, 
causes traps. The trap handler then has the opportunity to maintain appropriate values in the dirty 
bits. The dirty bit in directory entries is not tested by the i860 Microprocessor. The precise 
algorithm for using these bits is specified in Section 4.2.5. 

An operating system that supports paged virtual memory can use these bits to determine what 
pages to eliminate from physical memory when the demand for memory exceeds the physical 
memory available. The D and A bits in the PTE (page-table entry) are normally initialized to zero 
by the operating system. The processor sets the A bit when a page is accessed either by a read or 
write operation. When a data- or instruction-access fault occurs, the trap handler sets the D bit if 
an allowable write is being performed, then reexecutes the instruction. 

The operating system is responsible for coordinating its updates to the accessed and dirty bits with 
updates by the CPU and by other processors that may share the page tables. The i860 
Microprocessor automatically asserts the LOCK# signal while testing and setting the A bit. 

4.2.4.6 WRITABLE AND USER BITS 

The W (writable) and U (user) bits are used for page-level protection, which the i860 
Microprocessor performs at the same time as address translation. The concept of privilege for 
pages is implemented by assigning each page to one of two levels: 

1. Supervisor level (U=0) — for the operating system and other systems software and related 
data. 

2. User level (U= 1) — for applications procedures and data. 

The U bit of the psr indicates whether the i860 Microprocessor is executing at user or supervisor 
level. The i860 Microprocessor maintains the U bit of psr as follows: 

• The i860 Microprocessor copies the psr PU bit into the U bit when an indirect branch is 
executed and one of the trap bits is set. If PU was one, the i860 Microprocessor enters user 
level. 

• The i860 Microprocessor clears the psr U bit to indicate supervisor level when a trap occurs 
(including when the trap instruction causes the trap). The prior value of U is copied into 
PU. (The trap mechanism is described in Chapter 7; the trap instruction is described in 
Chapter 5.) 

With the U bit of psr and the W and U bits of the page table entries, the i860 Microprocessor 
implements the following protection rules: 

• When at user level, a read or write of a supervisor-level page causes a trap. 
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• When at user level, a write to a page whose W bit is not set causes a trap. 

• When at user level, st.c to certain control registers is ignored. 

When the i860 Microprocessor is executing at supervisor level, all pages are addressable, but, 
when it is executing at user level, only pages that belong to the user-level are addressable. 

When the i860 Microprocessor is executing at supervisor level, all pages are readable. Whether a 
page is writable depends upon the write-protection mode controlled by WP of epsr: 

WP=0 All pages are writable. 

WP= 1 A write to page whose W bit is not set causes a trap. 

When the i860 Microprocessor is executing at user level, only pages that belong to user level and 
are marked writable are actually writable; pages that belong to supervisor level are neither readable 
nor writable from user level. 

4.2.4.7 COMBINING PROTECTION OF BOTH LEVELS OF PAGE TABLES 

For any one page, the protection attributes of its page directory entry may differ form those of its 
page table entry. The i860 Microprocessor computes the effective protection attributes for a page 
by examining the protection attributes in both the directory and the page table. Table 4-1 shows 
the effective protection provided by the possible combinations of protection attributes. 

4.2.5 Address Translation Algorithm 

The algorithm below defines how the on-chip MMU translates each virtual address to a physical 
address. Let DIR, PAGE, and OFFSET be the fields of the virtual address; let PFAl and PFA2 
be the page frame address fields of the first and second level page tables respectively; DTB is the 
page directory table base address stored in the dirbase register. 

1. Assert LOCK#. 

2. Read the PTE (page table entry) at the physical address formed by DTB:DIR:00. 

3. If P in the PTE is zero, generate a data- or instruction-access fault. 

4. If W in the PTE is zero, the operation is a write, and either the U bit of the PSR is set or 
WP= 1 , generate a data-access fault. 

5. If the U bit in the PTE is zero and the U bit in the psr is set, generate a data- or instruction- 
access fault. 

6. If A in the PTE is zero, set A. 

7. Locate the PTE at the physical address formed by PFA1:PAGE:00. 

8. Perform the P, A, W, and U checks as in steps 3 through 6 with the second-level PTE. 
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9. If D in the PTE is clear and the operation is a write, generate a data-access fault. 

10. Form the physical address as PFA2:0FFSET, 

11. DeassertLOCK#. 

Table 4-1. Combining Directory and Page Protection 
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X indicates that, when the combined U attribute is supervisor 
and WP=0, the W attribute is not checl<ed. 



4.2.6 Address Translation Faults 

The address translation fault is one instance of the data-access fault. (Refer to Chapter 7 for more 
information on this and other faults.) The instruction causing the fault can be reexecuted by the 
return-from-trap sequence defined in Chapter 7. 

4.2.7 Page Translation Cache 

For greatest efficiency in address translation, the i860 Microprocessor stores the most recently 
used page-table data in an on-chip cache called the TLB (translation lookaside buffer). Only if the 
necessary paging information is not in the cache must both levels of page tables be referenced. 
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4.3 CACHING AND CACHE FLUSHING 

The i860 Microprocessor has the abihty to cache instruction, data, and address-translation 
information in on-chip caches. Caching may use virtual-address tags. The effects of mapping two 
different virtual addresses in the same address space to the same physical address are undefined. 

Instruction, data, and address-translation caching on the i860 Microprocessor are not transparent. 
Writes do not immediately update memory, the TLB, nor the instruction cache. Writes to memory 
by other bus devices do not update the caches. Under certain circumstances, such as I/O 
references, self-modifying code, page-table updates, or shared data in a multiprocessing system, 
it is necessary to bypass or to flush the caches, i860 Microprocessor provides the following 
methods for doing this: 

• Bypassing Instruction and Data Caches. If deasserted during cache-miss processing, the 
KEN# pin disables instruction and data caching of the referenced data. If the CD or WT bit 
from the associated second-level PTE is set, internal caching of data and instructions is 
disabled. The value of the CD or WT bit is output on the PTB pin for use by external caches. 

• Flushing Instruction and Address-Translation Caches. Storing to the dirbase register with 
the ITI bit set invalidates the contents of the instruction and address-translation caches. This 
bit should be set when a page table or a page containing code is modified or when changing 
the DTB field of dirbase . Note that in order to make the instruction or address-translation 
caches consistent with the data cache, the data cache must be flushed before invalidating the 
other caches. 

NOTE 

The mapping of the page containing the currently executing instruction and the next 6 
instructions should not be different in the new page tables when st.c dirbase changes 
DTB or activates ITI. The 6 instructions following the st.c should be nops, and 
should lie in the same page as the st.c. 

• Flushing the Data Cache. The data cache is flushed by the software routine shown in 
Chapter 5 with the flush instruction. The data cache must be flushed prior to flushing the 
instruction or address-translation cache (as controlled by the ITI bit of dirbase) or enabling 
or disabling address translation (via the ATE bit). 

The i860 CPU searches only external memory for Page Directories and Page Tables, in the 
translation process. The data cache is not searched. Thus Page Tables and Directories should be 
kept in non-cacheable memory, or flushed from the cache by any code which accesses them. 
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Chapter 5 
Core Instructions 



Core instructions include loads and stores of the integer, floating-point, and control registers; 
arithmetic and logical operations on the 32-bit integer registers; and control transfers. All these 
instructions are executed by the core unit. 

Key to abbreviations in the following descriptions of core instructions: 

srcl An integer register or a 16-bit immediate constant or address offset. The 

immediate value is zero-extended for logical operations and is sign-extended 
for add and subtract operations (including addu and subu) and for all 
addressing calculations. 

srcini Same as srcl except that no immediate constant or address offset value is 

permitted. 

src2 An integer register. 

rdest An integer register. 

freg A floating-point register. 

mem. x( address) The contents of the memory location indicated by address with a size of jt. 

# const A 16-bit immediate constant or address offset that the i860 Microprocessor 

sign-extends to 32 bits when computing the 
effective address. 

ctrlreg One of the control registers fir, epsr, psr, dirbase, db, or fsr. 

Ibrojf' A signed, 26-bit, immediate, relative branch offset. 

sbroff' A signed, 16-bit, immediate, relative branch offset. 

hrx A function that computes the target address by shifting the offset (either 

I brojf' or sbrojf) left by two bits, sign-extending it to 32 bits, and adding the 
result to the current instruction pointer plus four. The resulting target 
address may lie anywhere within the address space. 

srcJs An integer register or a 5-bit immediate constant that is zero-extended to 32 

bits. 

comp2 A function that returns the two's complement of its argument. 

The comments regarding optimum performance that appear in the subsections Programming 
Notes are recommendations only. If these recommendations are not followed, the i860 
Microprocessor automatically waits the necessary number of clocks to satisfy internal hardware 
requirements. 
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5.1 LOAD INTEGER 



Id.x srcl(src2), rdest (Load Integer) 

rdest -^ — mem.x (srcl + src2} 



.X = .b (8 bits), .s (16 bits), or .1 (32 bits) 

The load integer instruction transfers an 8-, 16-, or 32-bit value from memory to the integer 
registers. The srcl can be either a 16-bit immediate address offset or an index register. Loads of 
8- or 16-bit values from memory place them in the low-order bits of the destination registers and 
sign-extend them to 32-bit values in the destination registers. 

Traps 

If the operand is misaligned, a data- access trap results. 
Programming Notes 

For best performance, observe the following guidelines: 

1 . The destination of a load should not be referenced as a source operand by the next instruction. 

2. A load instruction should not directly follow a store that is expected to hit in the data cache. 

Even though immediate address offsets are limited to 16 bits, loads using a 32-bit address offset 
may be implemented by the following sequence (r31 is recommended for all such addressing 
calculations): 

orh HIGH16a, rO, r31 
Id. 1 LDW16Cr31 >, rdest 

Note that the i860 Microprocessor uses signed addition when it adds LOW 16 to r31 . If bit 15 of 
L0W16 is set, this has the effect of subtracting from r31 . Therefore, when bit 15 of LOW16 is 
set, HIGH 16a must be derived by adding one to the high-order 16 bits, so that the net result is 
correct. 

The assembler must align the immediate address offsets used in loads to the same boundary as the 
effective address, because the lower bits of the immediate offset are used to encode operand 
length information. 
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5.2 STORE INTEGER 



st.x srclni, #const(src2) (Store Integer) 

mem.x (src2 + #const) ^ — srclni 



.X = .b (8 bits), .s (16 bits), or .1 (32 bits) 

The store instruction transfers an 8-, 16-, or 32-bit value from the integer registers to memory. 
Stores do not allow an index register in the effective-address calculation, because srclni is used 
to specify the register to be stored. The # const is a signed, 16-bit, immediate address offset. An 
absolute address may be formed by using the zero register for src2 . Stores of 8- or 16-bit values 
store the low-order 8 or 16 bits of the register. 

Traps 

If the operand is misaligned, a data-access trap results. 

Programming Notes 

For best performance, a load instruction should not directly follow a store that is expected to hit 
in the data cache. 

Even though immediate address offsets are limited to 16 bits, a store using a 32-bit immediate 
address offset may be implemented by the following sequence (r31 is recommended for all such 
addressing calculations): 

orh HIGH16a, rO, r31 
st . 1 rdest, LDW16(r31 ) 

Note that the i860 Microprocessor uses signed addition when it adds LOW 16 to r31 . If bit 15 of 
LOW 16 is set, this has the effect of subtracting from r31 . Therefore, when bit 15 of LOW16 is 
set, HIGH16a must be derived by adding one to the high-order 16 bits, so that the net result is 
correct. 

The assembler must align the immediate address offsets used in stores to the same boundary as 
the effective address, because the lower bits of the immediate offset are used to encode operand 
length information. 

5.3 TRANSFER INTEGER TO F-P REGISTER 



ixfr srclni, freg (Transfer Integer to F-P Register) 

freg -^ — srclni 



The ixfr instruction transfers a 32-bit value from an integer register to a floating-point register. 
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Programming Notes 

For best performance, the destination of an ixfr should not be referenced as a source operand in 
the next two instructions. 

5.4 LOAD FLOATING-POINT 



Floating-Point Load 
fid.y srcJ(src2),freg (Normal) 

fid.y srcl(src2)++ freg (Autoincrement) 

freg A — mem.y (srcJ + src2) 
IF autoincrement 
THEN src2 <— srcl + src2 
FI 

Pipelined Floating-Point Load 
pfld.z srcl{src2), freg (Normal) 

pf ld.z src 1 (src2 )+ + , freg (Autoincrement) 

freg M — mem.z (third previous pf Id's (srcJ + src2)) 

(where .z is precision of third previous pfid.z) 
IF autoincrement 
THEN src2 <— srcl + src2 
FI 



.y = .1 (32 bits), .d (64 bits), or.q (128 bits); .z = .1 or.d 

Floating-point loads transfer 32-, 64-, or 128-bit values from memory to the floating-point 
registers. These may be floating-point values or integers. An autoincrement option supports 
constant-stride vector addressing. If this option is specified, the i860 Microprocessor stores the 
effective address into A7-f2 . 

Floating-point loads may be either pipelined or not. The load pipeline has three stages. A pfid 
returns the data from the address calculated by the third previous pfId, thereby allowing three 
loads to be outstanding on the external bus. When the data is already in the cache, both pipelined 
and nonpipelined forms of the load instruction read the data from the cache. The pipelined pfid 
instruction, however, does not place the data in the data cache on a cache miss. A pfId should be 
used only when the data is expected to be used once in the near future. Data that is expected to 
be used several times before being replaced in the cache should be loaded with the nonpipelined 
fid instruction. The fid instruction does not advance the load pipeline and does not interact with 
outstanding pfId instructions. 

Traps 

If the operand is misaligned, a data-access trap results. 

Programming Notes 

Apfid cannot load a 128-bit operand. 
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For best performance, observe the following guidelines: 

1 . The destination of a fid or pf Id should not be referenced as a source operand in the next two 
instructions. 

2. A fid instruction should not directly follow a store instruction that is expected to hit in the 
data cache. There is no performance impact for apfid following a store instruction. 

3. ApfId instruction should not directly follow another pf Id. 

The assembler must align the immediate address offsets used in loads to the same boundary as the 
effective address, because the lower bits of the immediate offset are used to encode operand 
length information. 

5.5 STORE FLOATING-POINT 





Floating-Point Store 


fst.y freg, srcl(src2) 


(Normal) 


fst.y fr(^g< srcl(src2)+ + 


(Autoincrement) 


mem.y (sn'2 + srcJ ) ^— freg 




IF autoincrement 




THEN S}x2 <- srcl + src2 




FI 





.y = .1 (32 bits), .d (64 bits), or.q (128 bits) 

Floating-point stores transfer 32-, 64-, or 128-bit values from the floating-point registers to 
memory. These may be floating-point values or integers. Floating-point stores allow srcl to be 
used as an index register. An autoincrement option supports constant- stride vector addressing. If 
this option is specified, the i860 Microprocessor stores the effective address into src2 . 

Traps 

If the operand is misaligned, a data- access trap results. 
Programming Notes 

For best performance, observe the following guidelines: 

1 . A fid instruction should not directly follow a store instruction that is expected to hit in the 
data cache. There is no performance impact for apfId following a store instruction. 

2. The, freg of an fst.y instruction should not reference the destination of the next instruction if 
that instruction is a pipelined floating-point operation. 

The assembler must align the immediate address offsets used in stores to the same boundary as 
the effective address, because the lower bits of the immediate offset are used to encode operand 
length information. 
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5.6 PIXEL STORE 



pst.d freg, #const(src2) (Pixel store) 

pst.d freg, #const(src2)-\-+ (Pixel store autoincrement) 

Pixels enabled by PM in mem.d {src2 + # const) ^ — freg 

Shift PM right by 8/pixel size (in bytes) bits 

IF autoincrement THEN src2 A — #const + src2 FI 



The pixel store instruction selectively updates the pixels in a 64-bit memory location. The pixel 
size is determined by the PS field in the psr. The pixels to be updated are selected by the low- 
order bits of the PM field in the psr. Each bit of PM corresponds to one pixel, with bit 
corresponding to the pixel at the lowest address. 

This instruction is typically used in conjunction with thefzchks orfzchkl instructions to implement 
Z-buffer hidden-surface elimination. When used this way, a pixel is updated only when it 
represents a point that is closer to the viewer than the closest point painted so far at that particular 
pixel location. Refer to Chapter 6 for more about fzchks andfzchkl. 

Traps 

If the operand is misaligned, a data-access trap results. 

5.7 INTEGER ADD AND SUBTRACT 

In addition to their normal arithmetic functions, the add and subtract instructions are also used to 
implement comparisons. For this use, rO is specified as the destination, so that the result is 
effectively discarded. Equal and not-equal comparisons are implemented with the xor instruction 
(refer to the section on logical instructions). 

Add and subtract ordinal (unsigned) can be used to implement multiple-precision arithmetic. 

Flags Affected 

CC and OF. 

Programming Notes 

For optimum performance, do not perform a conditional branch in the instruction following an 
add or subtract instruction. 

Refer to Chapter 9 for an example of how to handle the sign of 8- and 16-bit integers when 
manipulating them with 32-bit instructions. 

An instruction of the form subs -1 , src2 , rdest yields the one's complement of src2 . 
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addu 


srcl , src2, rdest (Add unsigned) 




rdest A — srcl + srcl 




OF '^— bit 31 carry 




CC '^— bit 31 carry 


adds 


srcl , srcl, rdest (Add signed) 




rdest -4— ■ srcl + srcl 




OF ^4— (bit 31 carry 9^ bit 30 carry) 




Using signed comparison, 




CC set if srcl < compl(srcl) 




CC clear if srcl ^ compl (srcl ) 


subu 


srcl, srcl, rdest (Subtract unsigned) 




rdest -4 — srcl — srcl 




OF 4— NOT (bit 31 carry) 




CC 4— bit 31 carry 




(i.e., using unsigned comparison. 




CC set if 5rc2^ srcl 




CC clear if srcl > srcl 


subs 


srcl , srcl, rdest (Subtract signed) 




rdest M— srcl — srcl 




OF <— (bit 31 carry 9^ bit 30 carry) 




Using signed comparison. 




CC set if srcl > srcl 




CC clear if srcl ^ srcl 



When srcl is immediate, the immediate value is sign-extended to 32-bits even for the unsigned 
instructions addu and subu . 

These instructions enable convenient encoding of a literal operand in a subtraction, regardless of 
whether the literal is the subtrahend or the minuend. For example: 





Calculation 


Encoding 


Signed 


r6 = 2 - r5 
r6 = r5 - 2 


subs 2, r5, r6 
adds - 2, r5, r6 


Unsigned 


r6 = 2 - r5 
r6 = r5 - 2 


subu 2, r5, r6 
addu - 2, r5, r6 



Note that the only difference between the signed and the unsigned forms is in the setting of the 
condition code CC. 
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The various forms of comparison between variables and constants can be encoded as follows: 



Condition 


Encoding 


Brancli Wlien True 


Signed 


Unsigned 


var ^ const 
var < const 
var ^ const 
var > const 


subs const, var 
subu const, var 

adds -const, var 
addu -const, var* 

adds -const, var 
addu -const, var* 

subs const, var 
subu const, var 


bnc 
be 
bnc 
be 


be 
bne 
be 
bne 



"Valid only when const > 



5.8 SHIFT INSTRUCTIONS 



shI 


srcl, src2, rdest (Shift left) 




rdest A — srcl shifted left by srcl bits 


shr 


srcl , srcl, rdest (Shift right) 




SC (in psr) <— srcl 

rdest -4 — srcl shifted right by srcl bits 


shra 


srcl , srcl, rdest (Shift right arithmetic) 




rdest -4 — srcl arithmetically shifted right by srcl bits 


shrd 


srclni, srcl, rdest (Shift right double) 




rdest A — low-order 32 bits of srcl niisrcl shifted right by SC bits 



The arithmetic shift does not change the sign bit; rather, it propagates the sign bit to the right srcl 
bits. 

Shift counts are taken modulo 32. A shrd right-shifts a 64-bit value with srcl being the high- 
order 32 bits and srcl the low-order 32 bits. The shift count for shrd is taken from the shift count 
of the last shr instruction, which is saved in the SC field of the psr. Shift-left is identical for 
integers and ordinals. 

Programming Notes 

The shift instructions are recommended for the integer register-to-register move and for no- 
operations, because they do not affect the condition code. The following assembler pseudo- 
operations utilize the shift instructions: 
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mov 


src2, rdest 


(Register-to-register move) 




Assembler pseudo-operation, 
shI rO , src2 , rdest 


equivalent to: 


nop 




(Core no-operation) 




Assembler pseudo-operation, 
shI rO, rO, rO 


equivalent to: 


fnop 




(Floating-point no-operation) 




Assembler pseudo-operation, 
shrd rO, rO, rO 


equivalent to: 



Rotate is implemented by: 



shr 
shrd 



COUNT, rO, rO 

op, op, op 



// Only loads COUNT into SC of PSR 
// Uses SC for shift count 



5.9 SOFTWARE TRAPS 



trap 






srcl, src2. 


rdest 


(Software trap) 






Generate 


trap 


with IT set 


inpsr 






Intovr 








(Software trap on 


integer overflow) 




If OF of 


epsr 


= 1 , generate trap 


with IT set in psr 





These instructions generate the instruction trap, as described in Chapter 7. 

The trap instruction can be used to implement supervisor calls and code breakpoints. The rdest 
should be zero, because its contents are undefined after the operation. The srcl and src2 fields 
can be used to encode the type of trap. 

The intovr instruction generates an instruction trap if OF bit (overflow flag) of epsr is set. It is 
used to test for integer overflow after the instructions adds , addu , subs , and subu . 

5.10 LOGICAL INSTRUCTIONS 

The operation is performed bitwise on all 32 bits of srcl and src2 . When srcl is an immediate 
constant, it is zero-extended to 32 bits. 

The "H" variant signifies "high" and forms one operand by using the immediate constant as the 
high-order 16 bits and zeros as the low-order 16 bits. The resulting 32-bit value is then used to 
operate on the src2 operand. 
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and 


srcl , srcl, rdest (Logical AND) 




rdest -4 — srcl AND srcl 




CC set if result is zero, cleared otherwise 


andh 


# const, srcl, rdest (Logical AND high) 




rdest -4— {#const shifted left 16 bits) AND srcl 




CC set if result is zero, cleared otherwise 


andnot srcl , src2, rdest (Logical AND NOT) 




rdest <— NOT srcl AND srcl 




CC set if result is zero, cleared otherwise 


andnoth # const, srcl, rdest (Logical AND NOT high) 




rdest <— NOT {# const shifted left 16 bits) AND srcl 




CC set if result is zero, cleared otherwise 


or 


srcl , srcl, rdest (Logical OR) 




rdest -4 — srcl OR srcl 




CC set if result is zero, cleared otherwise 


orh 


# const, srcl, rdest (Logical OR high) 




rdest <— {# const shifted left 16 bits) OR srcl 




CC set if result is zero, cleared otherwise 


xor 


srcl, srcl, rdest (Logical XOR) 




rdest -4— srcl XOR srcl 




CC set if result is zero, cleared otherwise 


xorh 


# const, srcl, rdest (Logical XOR high) 




rdest <— {# const shifted left 16 bits) XOR srcl 




CC set if result is zero, cleared otherwise 



Flags Affected 

CC is set if the result is zero, cleared otherwise. 
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Programming Notes 

Bit operations can be implemented using logical operations. Src/ is an immediate constant which 
contains a one in the bit position to be operated on and zeros elsewhere. 



Bit Operation 


Equivalent Logical 
Operation 


Set bit 
Clear bit 
Complement bit 
Test bit 


or 

andnot 

xor 

and (CC set if bit is clear) 



5.11 CONTROL-TRANSFER INSTRUCTIONS 

Control transfers can branch to any location within the address space. However, if a relative 
branch offset, when added to the address of the control-transfer instruction plus four, produces an 
address that is beyond the 32-bit addressing range of the i860 Microprocessor, the results are 
undefined. 

Many of the control-transfer instructions are delayed transfers. They are delayed in the sense that 
the i860 Microprocessor executes one additional instruction following the control-transfer 
instruction before actually transferring control. During the time used to execute the additional 
instruction, the i860 Microprocessor refills the instruction pipeline by fetching instructions from 
the new instruction address. This avoids breaks in the instruction execution pipeline. It is generally 
possible to fmd an appropriate instruction to execute after the delayed control-transfer instruction 
even if it is merely the first instruction of the procedure to which control is passed. 

Programming Notes 

The sequential instruction following a delayed control-transfer instruction may be neither another 
control-transfer instruction, nor a trap instruction, nor the target of a control-transfer instruction. 

The instructions bet and bnc.t are delayed forms of be and bnc. The delayed branch instructions 
bet and bnc.t should be used when the branch is taken more frequently than not; for example, at 
the end of a loop. The nondelayed branch instructions be, bnc, bte, btne should be used when 
branch is taken less frequently than not; for example, in certain search routines. 

If a trap occurs on a bla instruction or the next instruction, LCC is not updated. The trap handler 
resumes execution with the bla instruction, so the LCC setting is not lost. 
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br 






Ibrojf (Branch direct unconditionally) 




Execute 


one 


more sequential instruction. 




Continue execution at brx(lbrojf) . 


be 






Ibrojf (Branch on CC) 




IF 




CC= 1 




THEN 




continue execution at brx(lbrojf) 




FI 






bet 






Ibrojf (Branch on CC, taken) 




IF 




CC = 1 




THEN 




execute one more sequential instruction 
continue execution at brx(lbroff) 




ELSE 




skip next sequential instruction 




FI 






bnc 






Ibrojj' (Branch on not CC) 




IF 




CC - 




THEN 




continue execution at brx(lbrojf) 




FI 






bnc.l 


t 




Ibrojf (Branch on not CC, taken) 




IF 




CC = 




THEN 




execute one more sequential instruction 
continue execution at brx(lbrojf) 




ELSE 




skip next sequential instruction 




FI 






bte 






srcJs, src2, sbroff (Branch if equal) 




IF 




srcis = src2 




THEN 




continue execution at brx(sbrojf) 




FI 






btne 






srcJs, src2, sbroff (Branch if not equal) 




IF 




srcls ^ src2 




THEN 




continue execution at brx(sbrojj') 




FI 






bla 






(Branch on LCC and add) 




LCC 


temp 


clear if src2 < comp2(srcIni) (signed) 




LCC_ 


temp 


set if src2 ^ comp2{srclni) (signed) 




src2 -^ 


- src 


Ini + src2 




Execute 


one 


more sequential instruction 




IF 




LCC 




THEN 




LCC -4— LCC_temp 
continue execution at brx(sbrojf) 




ELSE 




LCC ^— LCC_temp 




FI 
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Programming Notes 

The bla instruction is useful for implementing loop counters, where src2 is the loop counter and 
srcl is set to — 1. In such a loop implementation, a bla instruction may be performed before the 
loop is entered to initialize the LCC bit of the psr. The target of this bla should be the sequential 
instruction after the next, so that the next sequential instruction is executed regardless of the 
setting of LCC. Another bla instruction placed as the next to the last instruction of the loop can 
test for loop completion and update the loop counter. The total number of iterations is the value 
of src2 before the first bla instruction, plus one. Example 5-1 illustrates this use of bla. 

Programs should avoid calling subroutines while within a bla loop, because a subroutine may use 
bla also and change LCC. 



// EXAMPLE OF bla USAGE 

// Write zeros to an array of 16 single-precision numbers 
// Starting address of array is already in r4 

// r5 <- - loop increment 

// r6 <- - loop count 

// One time to initialize LCC 

// Start one lower to 

// allow for autoincrement 

// Loop for the 16 times 
// Write and autoincrement 
// to next word 

Example 5-1. Example of bla Usage 



adds 


-1, 




rO, r5 


or 


15, 




rO, r6 


bla 


r5, 


r6, 


CLEAR LOOP 


addu 


-4, 




r4, r4 


CLEAR LOOP: 








bla 


r5, 


r6, 


CLEAR LOOP 


fst.l 


fO, 




4(r4)++ 



Return from a subroutine is implemented by branching to the return address with the indirect 
branch instruction bri . 

Indirect branches are also used to resume execution from a trap handler (refer to Chapter 7). The 
need for this type of branch is indicated by set trap bits in the psr at the time bri is executed. In 
this case, the instruction following the bri must be a load that restores srclni to the value it had 
before the trap occurred. 

Programming Notes 

When using bri to return from a trap handler, programmers should take care to prevent traps from 
occurring on that or on the next sequential instruction. IM should be zero (interrupts disabled). 
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call (Subroutine call) 

rl -^ — address of next sequential instruction + 4 
Execute one more sequential instruction 
Continue execution at brx(lbrojf) 

call! [srcini] (Indirect subroutine call) 

rl A — address of next sequential instruction + 4 

Execute one more sequential instruction 

Continue execution at address in srcini 

(The original contents of srcini is used even if the 
next instruction modifies srcini. Does not trap if 
srcini is misaligned.) 

bri [srcini] (Branch indirect unconditionally) 

Execute one more sequential instruction 



IF 


any trap bit in psr 


is set 


THEN 


copy PU to U, PIM to IM in psr 




clear trap 


bits 






IF 


DS is set and DIM is reset 




THEN 


enter dual-instruction mode after executing one instruction in 
single-instruction mode 




ELSE 


IF 


DS is set and DIM is set 






THEN 


enter single-instruction mode after executing one 
instruction in dual- instruction mode 






ELSE 


IF DIM is set 

THEN enter dual- instruction mode for next 

two instructions 
ELSE enter single-instruction mode for next 

two instructions 



FI 
FI 
FI 
FI 

Continue execution at address in srcini 

(The original contents of srcini is used even if the next instruction modifies 
srcini. Does not trap if srcini is misaligned.) 



5.12 CACHE FLUSH 

The flush instruction is used to force modified data in the data cache to external memory. Because 
the contents of rdest are undefined after flush, translators should encode it as zero. The address 
#vonst + src2 must be aligned on a 16-byte boundary. There are two 32-byte blocks in the cache 
which can be replaced by the address #const + src2 . The particular block that is forced to 
memory is controlled by the RB field of dirbase. When flushing the cache before a task switch, 
the addresses used by the flush instruction should reference non-user-accessible memory to ensure 
that cached data from the old task is not transferred to the new task. These addresses must be 
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(Cache flush) 




flush 


#const{src2) 




(Normal) 


flush 


# const (src2)+ + 




(Autoincrement) 




Replace the block in data cache that has address (#cc>r 


St + src2). 




Contents of block undefined. 








IF autoincrement 








THEN src2 <— #const + src2 








FI 







Example 5-2 shows how to flush the data cache using the flush instruction. The code depends on 
having reserved a 4 Kbyte memory area that is not used to store data. Cache elements containing 
modified data are written back to memory by making two passes, each of which references every 
32nd byte of this area with the flush instruction. Before the first pass, the RC field in dirbase is 
set to two and RB is set to zero. This causes data-cache misses to flush element zero of each set. 
Before the second pass, RB is changed to one, causing element one of each set to be flushed. 

The flush instruction must only be used as in Example 5-2. Any other usage of flush has 
undefined results. 



// CACHE FLUSH PROCEDURE 

// Rw, Rx, Ry, Rz represent Integer registers 

// FLUSH_P_H is the high- order 16 bits of a pointer to reserved area 

// FLUSH_P_L is the low- order 16 bits of the pointer, minus 32 



Id.c 

or 

adds 

call 

St .c 

or 

call 

St .c 

xor 



dirbase , 
0x800, 

-1, 

D_FLUSH 

Rz, 

0x900, 

D_FLUSH 

Rz 

0x900, 



// Change DTB, ATE, 



s t . c 

D_FLUSH: 
orh 



Rz, 



or 
or 

Id.l 
shl 
bla 
nop 
D_FLUSH_LOOP : 

bla Rx, Ry, 



FLUSH_P 

FLUSH_P" 

127, 

32 (Rw), 

0, 

Rx, Ry, 



H, 
"L, 



Rz 

Rz, 

rO, 



Rz 
Rx 



dirbase 
Rz, Rz 



// RC <-- OblO (assuming was 00) 
// Rx <-- -1 (loop increment) 

// Replace in block 
// RB <-- ObOl 



dirbase // Replace in block 1 

Rz, Rz // Clear RC and RB 

or ITI fields here, if necessary 
dirbase 



rO, 
Rw, 
rO, 
r31 
r31, 
D FLUSH LOOP 



Rw // Rw <-- address minus 32 
Rw // of flush area 
Ry // Ry <-- loop count 

// Clear any pending bus writes 
r31 // Wait until load finishes 

// One time to initialize LCC 



D FLUSH LOOP 



flush 

bri 

Id.l 



32(Rw)+-l- 

rl 

-512(Rw) 



rO 



// Loop; execute next instruction 
// for 128 lines in cache block 
// Flush and autoincrement to next line 

// Return after next instruction 
// Load from flush area to clear pending 
// writes. A hit is guaranteed. 



Example 5-2. Cache Flush Procedure 
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5.13 CONTROL REGISTER ACCESS 



Id.c ctrlreg, rdest 


(Load from control register) 


rdest -^ — ctrlreg 




St.c srclni, ctrlreg 


(Store to control register) 


ctrlreg A — srclni 





Ctrlreg specifies a control register that is transferred to or from a general-purpose register. The 
function of each control register is defined in Chapter 3. As shown below, some registers or parts 
of registers are write-protected when the U-bit in the psr is set. A store to those registers or bits 
is ignored when the i860 Microprocessor is in user mode. Ctrlreg is specified by a code in the 
src2 field of the instruction, as defined by Table 5-1. 

Table 5-1 . Control Register Encoding 



Register 


Src2 Code 


User-Mode 
Write-Protected? 


Fault Instruction 
Processor Status 
Directory Base 
Data Breakpoint 
Floating-Point Status 
Extended Process Status 



1 
2 
3 
4 
5 


N/A 

Yes* 

Yes 

Yes 

No 

Yes** 



* Only the psr bits BR, BW, PIM, IM, PU, U, IT, IN, lAT, DAT, FT, DS, DIM, and KNF are write-protected. 
** The processor type, stepping number, and cache size cannot be changed from either user or supervisor level. 

Programming Notes 

Saving fir (the fault instruction register) anytime except the first time after a trap occurs saves the 
address of the Id.c instruction. 

After a scalar floating-point operation, a st.c to fsr should not change the value of RR, RM, or 
FZ until the point at which result exceptions are reported. (Refer to Chapter 7 for more details.) 

Only a trap handler should use the intmction st.c to set the trap bits (IT, IN, lAT, DAT, FT) of 
the psr. 

5.14 BUS LOCK 

These instructions allow programs running in either user or supervisor mode to perform read- 
modify- write sequences in multiprocessor and multithread systems. The interlocked sequence must 
not branch outside of the 32 sequential instructions following the lock instruction. The sequence 
must be restartable from the lock instruction in case a trap occurs. Simple read-modify- write 
sequences are automatically restartable. For sequences with more than one store, the software 
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must ensure that no traps occur after the first non-reexecutable store. To insure that no data access 
fault occurs, it must first store unmodified values in the other store locations. To insure that no 
instruction access fault occurs, the code that is not restartable should not span a page boundary. 



lock (Begin interlocked sequence) 

Set BL in dirbase. The next load or store that misses the cache locks the bus. 
Disable interrupts until the bus is unlocked. 

unlock (End interlocked sequence) 

Clear BL in dirbase. The next load or store that misses the cache unlocks 
the bus. 



After a lock instruction, the bus is not locked until the first data access that misses the data cache. 
Software in a multiprocessing system should ensure that the first load instruction after a lock 
references noncacheable memory. Likewise, after an unlock instruction, the bus is not unlocked 
until the first data access that misses the data cache. Software in a multiprocessing system should 
ensure that the first load or store instruction after an unlock references noncacheable memory. 

If a trap occurs after a lock instruction and before the load or store that follows the corresponding 
unlock, the processor clears BL and sets the IL (interlock) bit of epsr. 

If the processor encounters another lock instruction before unlocking the bus, that instruction is 
ignored. 

If, following a lock instruction, the processor does not encounter a load or store following an 
unlock instruction by the time it has executed 32 instructions, it triggers an instruction fault on 
the 32nd instruction. In such a case, the trap handler will find both IL and IT set. 

Example 5-3 shows how lock and unlock can be used in a variety of interlocked operations. 
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// LOCKED TEST AND SET 

// Value to put in semaphore is in r23 

lock // 

Id.b semaphore, r22 // Put current value of semaphore in r22 

unlock // 

st.b r23, semaphore // 

// LOCKED LOAD -ALU -STORE 

lock // 

Id.l word, r22 // 

addu 1, r22 , r22 // Can be any ALU operation 

unlock // 

st.l r22, word // 

// LOCKED COMPARE AND SWAP 

// Swaps r23 with word in memory, if word = r21 

lock // 

Id.l word, r22 // 

bte r22, r21, LI // 

mov r22, r23 // Executed only if not equal 
Ll : unlock // 

st.l r23, word // 

Example 5-3. Examples of lock and unlock Usage 
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Chapter 6 
Floating-Point Instructions 



The floating-point section of the i860 Microprocessor comprises the floating-point registers and 
three processing units: 

1 . The floating-point multiplier 

2. The floating-point adder 

3. The graphics unit 

This section of the i860 Microprocessor executes not only floating-point operations but also 64- 
bit integer operations and graphics operations that utilize the 64-bit internal data path of the 
floating-point section. 

Floating-point instruction operands srcl , src2 , and rdest refer to one of the 32 floating-point 
registers; ireg refers to one of the integer registers. 

6.1 PRECISION SPECIFICATION 

Unless otherwise specified, floating-point operations accept single- or double-precision source 
operands and produce a result of equal or greater precision. Both input operands must have the 
same precision. The source and result precision are specified by a two-letter suffix to the 
mnemonic of the operation, as shown below. In this manual, the suffix .p refers to the precision 
specification. In an actual program, .p is to be replaced by the appropriate two-letter suffix. 



Suffix 


Source Precision 


Result Precision 


.ss 
.sd 
.dd 


single 
single 
double 


single 
double 
double 



6.2 PIPELINED AND SCALAR OPERATIONS 

The architecture of the floating-point unit uses parallelism to increase the rate at which operations 
may be introduced into the unit. One type of parallelism used is called "pipelining". The 
pipelined architecture treats each operation as a series of more primitive operations (called 
"stages") that can be executed in parallel. Consider just the floating-point adder unit as an 
example. Let A represent the operation of the adder. Let the stages be represented by A}, A2, 
and A3. The stages are designed such that Aj+i for one adder instruction can execute in parallel 
with Ai for the next adder instruction. Furthermore, each Aj can be executed in just one clock. 
The pipelining within the multiplier and graphics units can be described similarly, except that the 
number of stages may be different. 

Figure 6-1 illustrates three-stage pipelining as found in the floating-point adder (also in the 
floating-point multiplier when single-precision input operands are employed). The columns of the 
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Stage 1 
results (status) 


Stage 2 
results (status) 


Stage 3 
results (status) 


1 + 3 

rdest 
i + 4 

rdest 
1 + 5 




Clock m 


Instruc ■ 
i 


r (s) 








^ Clock m + 1 ^ 


Instruc 
1 + 1 


i + 1 

r (s) 


(s) 






^ Clock m + 2 \ 


Instruc ■ 
1 + 2 


i + 2 

r (s) 


i + 1 

r (s) 


r s 




\ \ \ 

^ Clock m + 3 ^ 


Instruc ■ 
1 + 3 


i + 3 

r (s) 


i + 2 

r (s) 


i + 1 

r s 




^ Clock m + 4 ^ 


Instruc 
1 + 4 


i + 4 

r (s) 


i + 3 

r (s) 


i + 2 

r s 




^ Clock m + 5 ^ 


Instruc 
1 + 5 


i + 5 

r (s) 


i + 4 

r (s) 


i + 3 

r s 





Figure 6-1. Pipelined Instruction Execution 
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figure represent the three stages of the pipeHne. Each stage holds intermediate results and also 
(when introduced into the first stage by software) holds status information pertaining to those 
results. The figure assumes that the instruction stream consists of a series of consecutive floating- 
point instructions, all of one type (i.e. all adder instructions or all single-precision multiplier 
instructions). The instructions are represented as i, i+1, etc. The rows of the figure represent the 
states of the unit at successive clock cycles. Each time a pipelined operation is performed, the 
status of the last stage becomes available in fsr, the result of the last stage of the pipeline is stored 
in the destination register rdest, the pipeline is advanced one stage, and the input operands srcl 
and src2 are transferred to the first stage of the pipeline. 

In the i860 Microprocessor, the number of pipeline stages ranges from one to three. A pipelined 
operation with a three-stage pipeline stores the result of the third prior operation. A pipelined 
operation with a two-stage pipeline stores the result of the second prior operation. A pipelined 
operation with a one-stage pipeline stores the result of the prior operation. 

There are four floating-point pipelines: one for the multiplier, one for the adder, and one for the 
graphics unit, and one for floating-point loads. The adder pipeline has three stages. The number 
of stages in the multiplier pipeline depends on the precision of the source operands in the pipeline; 
it may have two or three stages. The graphics unit has one stage for all precisions. The load 
pipeline has three stages for all precisions. 

Changing the FZ (flush zero), RM (rounding mode), or RR (result register) bits of fsr while there 
are results in either the multiplier or adder pipeline produces effects that are not defined. 

6.2.1 Scalar Mode 

In addition to the pipelined execution mode described above, the i860 Microprocessor also can 
execute floating-point instructions in "scalar" mode. Most floating-point instructions have both 
pipelined and scalar variants, distinguished by a bit in the instruction encoding. In scalar mode, 
the floating-point unit does not start a new operation until the previous floating-point operation is 
completed. The scalar operation passes through all stages of its pipeline before a new operation is 
introduced, and the result is stored automatically. Scalar mode is used when the next operation 
depends on results from the previous few floating-point operations (or when the compiler or 
programmer does not want to deal with pipelining). 

6.2.2 Pipelining Status Information 

Result status information in the fsr consists of the AA, AI, AO, AU, and AE bits, in the case of 
the adder, and the MA, MI, MO, and MU bits, in the case of the multiplier. This information 
arrives at the fsr via the pipeline in one of two ways: 

1. It is calculated by the last stage of the pipeline. This is the normal case. 

2. It is propagated from the first stage of the pipeline. This method is used when restoring the 
state of the pipeline after a preemption. When a store instruction updates the fsr and the the 
U bit being written into the fsr is set, the store updates result status bits in the first stage of 
both the adder and multiplier pipelines. When software changes the result-status bits of the 
first stage of a particular unit (multiplier or adder), the updated result- status bits are propagated 
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one stage for each pipelined floating-point operation for that unit. In this case, each stage of 
the adder and multiplier pipelines holds its own copy of the relevant bits of the f sr . When 
they reach the last stage, they override the normal result-status bits computed from the last- 
stage result. 

At the next floating-point instruction (or at certain core instructions), after the result reaches the 
last stage, the i860 Microprocessor traps if any of the status bits of the fsr indicate exceptions. 
Note that the instruction that creates the exceptional condition is not the instruction at which the 
trap occurs. 

6.2.3 Precision in the Pipelines 

In pipelined mode, when a floating-point operation is initiated, the result of an earlier pipelined 
floating-point operation is returned. The result precision of the current instruction applies to the 
operation being initiated. The precision of the value stored in rdest is that which was specified by 
the instruction that initiated that operation. 

If rdest is the same as srcl or src2 , the value being stored in rdest is used as the input operand. 
In this case, the precision of rdest must be the same as the source precision. 

The multiplier pipeline has two stages when the source operand is double-precision and three 
stages when the precision of the source operand is single. This means that a pipelined multiplier 
operation stores the result of the second previous multiplier operation for double-precision inputs 
and third previous for single-precision inputs (except when mixing precisions). 

6.2.4 Transition between Scalar and Pipelined Operations 

When a scalar operation is executed in the adder, multiplier, or graphics units, it passes through 
all stages of the pipeline; therefore, any unstored results in the affected pipeline are lost. To avoid 
losing information, the last pipelined operations before a scalar operation should be dummy 
pipelined operations that extract results from the affected pipeline. 

After a scalar operation, the values of all pipeline stages of the affected unit (except the last) are 
undefined. No spurious result-exception traps result when the undefined values are subsequently 
stored by pipelined operations; however, the values should not be referenced as source operands. 

Note that the pfid pipeline is not affected by scalar fid or Id instructions. 

For best performance a scalar operation should not immediately precede a pipelined operation 
whose rdest is nonzero. 

6.3 MULTIPLIER INSTRUCTIONS 

The multiplier unit of the floating-point section performs not only the standard floating-point 
multiply operation but also provides reciprocal operations that can be used to implement floating- 
point division and provides a special type of multiply that assists in coding integer multiply 
sequences. The multiply instruction can be pipelined. 
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Programming Notes 

Complications arise with sequences of pipelined multiplier operations with mixed single- and 
double-precision inputs because the pipeline length is different for the two precisions. The 
complications can be avoided by not mixing the two precisions; i.e., by flushing out all single- 
precision operations with dummy single-precision operations before starting double-precision 
operations, and vic-e versa. For the adventuresome, the rules for mixing precisions follow: 

• Single to Double Transitions. When a pipelined multiplier operation with double-precision 
inputs is executed and the previous multiplier operation was pipelined with single-precision 
inputs, the third previous (last stage) result is stored, and the previous operation (first stage) 
is advanced to the second stage (now the last stage). The second previous operation (old 
second stage) is discarded. The next pipelined multiplier operation stores the single-precision 
result. 

• Double to Single Transitions. When a pipelined multiplier operation with single-precision 
inputs is executed and the previous multiplier operation was pipelined with double-precision 
inputs, the previous multiplier operation is advanced to the second stage and a single- or 
double-precision zero is placed in the last stage of the pipeline. The next pipelined multiplier 
operation stores zero instead of the result of the prior operation. 

6.3.1 Floating-Point Multiply 



fmul 


P 






srcl , 


srcl, 


rdest 




rdest 


^^— 


SIX 


1 X src2 




pfmi 


il.p 






srcl , 


src2, 


rdest 



(Floating-Point Multiply) 

(Pipelined Floating-Point Multiply) 

rdest M — last M-stage result 

Advance M pipeline one stage 

M pipeline first stage M — srcl x src2 

pfmul3.dd srcJ , srcl, rdest (Three-Stage Pipelined Multiply) 

rdest A — last M-stage result 
Advance 3-stage M pipeline one stage 
M pipeline first stage A — srcl X srcl 



These instructions perform a standard multiply operation. 

Programming Notes 

Srcl must not be the same as rdest for pipelined operations. For best performance when the prior 
operation is scalar, srcl should not be the same as the rdest of the prior operation. 

The pfmul3.dd instruction is intended primarily for use by exception handlers in restoring pipeline 
contents (refer to "Pipeline Preemption" in Chapter 7). It should not be mixed in instruction 
sequences with other pipelined multiplier instructions. 
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6.3.2 Floating-Point l\/lultiply Low 



fmlow.dd srcl, src2, rdest (Floating-Point Multiply Low) 

rdest -^ — low-order 53 bits of {srcl significand X src2 significand) 

rdest bit 53 A — most significant bit of {srcl significand X srcl significand) 



The fmlow instruction multiplies the low-order bits of its operands. It operates only on double- 
precision operands. The high-order 10 bits of the result are undefined. 

An fmlow can perform 32-bit integer multiplies. Two 64-bit values are formed, with the integers 
in the low-order 32 bits. The low-order 32-bits of the result are the same as the low-order 32 bits 
of an integer multiply. The fmlow instruction does not update the resuh-status bits of fsr and does 
not cause source- or result-exception traps. 

6.3.3 Floating-Point Reciprocals 



frcp.p 




src2, rdest 


(Floating-Point Reciprocal) 




rdest 


^ 


1 / srcl with absolute sij 


gnificand error < 2~^ 




frsqr.p 




srcl, rdest 


(Floating-Point Reciprocal Square 


Root) 


rdest 


<- 


1 / Vsrcl with absolute 


significand error < 2~^ 





The frcp and frsqr instructions are intended to be used with algorithms such as the Newton- 
Raphson approximation to compute divide and square root. Assemblers and compilers must set 
srcl to zero. A Newton-Raphson approximation may produce a result that is different from the 
IEEE standard in the two least significant bits of the mantissa. A library routine supplied by Intel 
may be used to calculate the correct IEEE-standard rounded result. 

Traps 

The instructions frcp and frsqr cause the source-exception trap if srcl is zero. An frsqr causes 
the source-exception trap \f srcl < 0. 

6.4 ADDER INSTRUCTIONS 

The adder unit of the floating-point section provides floating-point addition, subtraction, and 
comparison, as well as conversion from floating-point to integer formats. 
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6.4.1 Floating-Point Add and Subtract 



fadd.p srcl , src2, rdest 


(Floating-Point Add) 


rdest -^ — srcl + src2 




pfadd.p srcl , src2, rdest 


(Pipelined Floating-Point Add) 


rdest -4 — last A-stage result 
Advance A pipeline one stage 
A pipeline first stage M — srcl + 


srcl 


fsub.p srcl , srcl, rdest 


(Floating-Point Subtract) 


rdest A — srcl — srcl 




pfsub.p srcl, srcl, rdest 


(Pipelined Floating-Point Subtract) 


rdest < — last A-stage result 
Advance A pipeline one stage 
A pipeline first stage A — srcl — 


srcl 



These instructions perform standard addition and subtraction operations. 

Programming Notes 

In order to allow conversion from double precision to single precision, an fadd or pfadd 
instruction may have double-precision inputs and a single-precision output, as long as one of its 
input operands isfO. In assembly language, this conversion is specified using the f mov orpfmov 
pseudoinstruction with the .ds suffix. 



fmov.ds 


srcl , rdest 




(Convert Double to Single) 


Equivalent 


to fadd.ds srcl , 


fo. 


rdest 


pfmov.ds 


srcl , ireg 




(Pipelined Convert Double to Single) 


Equivalent to pfadd.ds srcl 


,fo 


, rdest 



Conversion from single to double is accomplished by fadd.sd or pfadd.sd with fO as one input 
operand. In assembly language, this conversion is specified by thefmov orpfmov pseudoinstruction 
with the .sd suffix. 



fmov.sd 


srcl , rdest 




(Convert Single 


to Double) 


Equivalent to fadd.sd srcl , 


fo, 


rdest 




pfmov.sd 


srcl , ireg 




(Pipelined Convert Single to Double) 


Equivalent 


to pfadd.sd srcl 


,fO 


rdest 
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6.4.2 Floating-Point Compares 



pfgt.p srcl , src2, rdest (Pipelined Floating-Point Greater-Than Compare) 

(Assembler clears R-bit of instruction) 
rdest A — last A-stage result 
CC set if srcl > srcl , else cleared 
Advance A pipeline one stage 
A pipeline first stage is undefined, but no result 
exception occurs 

pfle.p srcl , srcl, rdest (Pipelined F-P Less-Than or Equal Compare) 

(Assembler pseudo-operation, identical to pfgt.p 

except that assembler sets R-bit of instruction.) 
rdest A — last A-stage result 
CC cleared if srcl ^ srcl , else set 
Advance A pipeline one stage 
A pipeline first stage is undefined, but no result 

exception occurs 

pfeq.p srcl , srcl, rdest (Pipelined Floating-Point Equal Compare) 

rdest -^ — last A-stage result 
CC set \i srcl = srcl, else cleared 
Advance A pipeline one stage 
A pipeline first stage is undefined, but no result 
exception occurs 



There are no corresponding scalar versions of the floating-point compare instructions. The 
pipelined instructions can be used either within a sequence of pipelined instructions or within a 
sequence of nonpipelined (scalar) instructions. 

pfgt.p should be used for A > B and A < B comparisons, pfle.p should be used for A 5= B and 
A ^ B comparisons, pfeq.p should be used for A = B and A :^ B comparisons. 

Traps 

Compares never cause result exceptions when the result is stored. They do trap on invalid input 
operands. 

Programming Notes 

The only difference between pfgt.p and pfle.p is the encoding of the R bit of the instruction and 
the way in which the trap handler treats unordered compares. The R bit normally indicates result 
precision, but in the case of these instructions it is not used for that purpose. The trap handler can 
examine the R bit to help determine whether an unordered compare should set or clear CC to 
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conform with the IEEE standard for unordered compares. For pfgt.p andpfeq.p, it should clear 
CC; forpfle.p, it should set CC. 

For best performance, a be orbnc instruction should not directly follow apfgt orpfeq instruction. 

6.4.3 Floating-Point to Integer Conversion 



fix.p srcl , rdest (Floating-Point to Integer Conversion) 

rdest "^ — 64-bit value with low-order 32 bits equal to integer part of srcl rounded 

pfix.p srcl, rdest (Pipelined Floating-Point to Integer Conversion) 

rdest -^ — last A-stage result 
Advance A pipeline one stage 

A pipeline first stage -^ — 64-bit value with low-order 32 bits equal to integer part 
of srcl rounded 

ftrunc.p srcl, rdest (Floating-Point to Integer Truncation) 

rdest ■4— 64-bit value with low-order 32 bits equal to integer part of srcl 

pftrunc.p srcl , rdest Pipelined Floating-Point to Integer Truncation) 

rdest M — last A-stage result 
Advance A pipeline one stage 

A pipeline first stage "4 — 64-bit value with low-order 32 bits equal to integer part 
of srcl 



The instructions fix and pfix must specify double-precision results. The low-order 32 bits of the 
result contain the integer part of srcl represented in twos-complement form. For fix and pfix, the 
integer is selected according to the rounding mode specified by RM in thefsr. 

The instructions ftrunc and pftrunc are identical to fix and pfix, except that RM is not consulted; 
rounding is always toward zero. Src2 should contain zero. 

Traps 

The instructions fix, pfix, ftrunc, and pftrunc signal overflow if the integer part of srcl is bigger 
than what can be represented as a 32-bit twos-complement integer. Underflow and inexact are 
never signaled. 

6.5 DUAL OPERATION INSTRUCTIONS 

The instructions pfam , pfsm , pf mam , and pf msm initiate both an adder (A-unit) operation and a 
multiplier (M-unit) operation. The source precision specified by -p applies to the source operands 
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of the multiplication. The result precision normally specified by .p controls in this case both the 
precision of the source operands of the addition or subtraction and the precision of all the results. 



pfam.p srcl , src2, rdest (Pipelined Floating-Point Add and Multiply) 

rdest A — last A-stage result 

Advance A and M pipeline one stage (operands accessed before advancing pipeline) 

A pipeline first stage -4 — A-opl + A-op2 

M pipeline first stage A — M-opl X M-op2 



pfsm.p srcl, srcl, rdest 



(Pipelined Floating-Point Subtract and Multiply) 



rdest •4— last A-stage result 

Advance A and M pipeline one stage (operands accessed before advancing pipeline) 

A pipeline first stage < — A-opl — A-op2 

M pipeline first stage A — M-opl X M-op2 

pfmam.p srcl, srcl, rdest (Pipelined Floating-Point Multiply with Add) 

rdest A — last M- stage result 

Advance A and M pipeline one stage (operands accessed before advancing pipeline) 

A pipeline first stage A — A-opl + A-op2 

M pipeline first stage A — M-opl X M-op2 

pfmsm.p srcl , srcl, rdest (Pipelined Floating-Point Multiply with Subtract) 

rdest A — last M-stage result 

Advance A and M pipeline one stage (operands accessed before advancing pipeline) 

A pipeline first stage A — A-opl — A-op2 

M pipeline first stage < — M-opl x M-op2 



Suffix 


Precision 

of Source 

of Multiplication 


Precision of Source 
of Add or Subtract and 
Result of All Operations 


.ss 
.sd 
.dd 


single 
single 
double 


single 
double 
double 



The instructions pfmam and pfmsm are identical to pfam and pfsm except that pfmam and 
pfmsm transfer the last stage result of the multiplier to rdest (the adder result is lost). 

Six operands are required, but the instruction format specifies only three operands; therefore, there 
are special provisions for specifying the operands. These special provisions consist of: 

• Three special registers (KR, KI, and T), that can store values from one dual-operation 
instruction and supply them as inputs to subsequent dual-operation instructions. 
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— The constant registers KR and KI can store the value of srcl and subsequently supply 
that value to the M-pipeline in place of srcl . 

— The transfer register T can store the last-stage result of the multiplier pipeline and 
subsequently supply that value to the adder pipeline in place of srcl . 

• A four-bit data-path control field in the opcode (DPC) that specifies the operands and loading 
of the special registers. 

1. Operand- 1 of the multiplier can be KR, KI, or srcl . 

2. Operand-2 of the multiplier can be srcl, the last-stage result of the multiplier pipeline, 
or the last-stage result of the adder pipeline. 

3. Operand- 1 of the adder can be srcl , the T-register, the last-stage result of the multiplier 
pipeline, or the last-stage result of the adder pipeline. 

4. Operand-2 of the adder can be srcl , the last-stage result of the multiplier pipeline, or the 
last-stage result of the adder pipeline. 

Figure 6-2 shows all the possible data paths surrounding the adder and multiplier. Table 6-1 
shows how the various encodings of DPC select different data paths. Figure 6-3 illustrates the 
actual data path for each dual-operation instruction. 



srcl 



?9 

I KI I KF 



1:1 



u u 



opi op2 

MULTIPLIER UNIT 
RESULT 



opi op2 

ADDER UNIT 

RESULT 



src2 rdest 

a 



V T f T y T 



Figure 6-2. Dual-Operation Data Paths 
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Table 6-1. DPC Encoding 



DPC 


PFAM 


PFSM 


M-Unit 


M-Unit 


A-Unit 


A-Unit 


T 


K 


Mnemonic 


Mnemonic 


opi 


op2 


opi 


op2 


Load 


Load* 


0000 


r2p1 


r2s1 


KR 


src2 


srcl 


M result 


No 


No 


0001 


r2pt 


r2st 


KR 


src2 


T 


M result 


No 


Yes 


0010 


r2ap1 


r2as1 


KR 


src2 


srcl 


A result 


Yes 


No 


0011 


r2apt 


r2ast 


KR 


src2 


T 


A result 


Yes 


Yes 


0100 


i2p1 


i2s1 


Kl 


src2 


srcl 


M result 


No 


No 


0101 


i2pt 


i2st 


Kl 


src2 


T 


M result 


No 


Yes 


0110 


i2ap1 


i2as1 


Kl 


src2 


srcl 


A result 


Yes 


No 


0111 


i2apt 


i2ast 


Kl 


src2 


T 


A result 


Yes 


Yes 


1000 


rat1p2 


rat1s2 


KR 


A result 


srcl 


src2 


Yes 


No 


1001 


m12apm 


m12asm 


srcl 


src2 


A result 


M result 


No 


No 


1010 


ra1p2 


ra1s2 


KR 


A result 


srcl 


src2 


No 


No 


1011 


m12ttpa 


m12ttsa 


srcl 


src2 


T 


A result 


Yes 


No 


1100 


iati p2 


iat1s2 


Kl 


A result 


srcl 


src2 


Yes 


No 


1101 


m12tpm 


m12tsm 


srcl 


src2 


T 


M result 


No 


No 


1110 


ia1p2 


ia1s2 


Kl 


A result 


srcl 


src2 


No 


No 


1111 


m12tpa 


m12tsa 


src1 


src2 


T 


A result 


No 


No 



DPC 


PFMAM 


PFMSM 


M-Unit 


M-Unit 


A-Unit 


A-Unit 


T 


K 


Mnemonic 


Mnemonic 


opi 


op2 


opi 


op2 


Load 


Load* 


0000 


mr2p1 


mr2s1 


KR 


src2 


srcl 


M result 


No 


No 


0001 


mr2pt 


mr2st 


KR 


src2 


T 


M result 


No 


Yes 


0010 


mr2mp1 


mr2ms1 


KR 


src2 


srcl 


M result 


Yes 


No 


0011 


mr2mpt 


mr2mst 


KR 


src2 


T 


M result 


Yes 


Yes 


0100 


mi2p1 


mi2s1 


Kl 


src2 


srcl 


M result 


No 


No 


0101 


mi2pt 


mi2st 


Kl 


src2 


T 


M result 


No 


Yes 


0110 


mi2mp1 


mi2ms1 


Kl 


src2 


src1 


M result 


Yes 


No 


0111 


mi2mpt 


mi2mst 


Kl 


src2 


T 


M result 


Yes 


Yes 


1000 


mrmtl p2 


mrmtl s2 


KR 


M result 


srcl 


src2 


Yes 


No 


1001 


mm12mpm 


mm12msm 


srcl 


src2 


M result 


M result 


No 


No 


1010 


mrm1p2 


mrm1s2 


KR 


M result 


srcl 


src2 


No 


No 


1011 


mm12ttpm 


mm12ttsm 


srcl 


src2 


T 


M result 


Yes 


No 


1100 


mimtl p2 


mimtl s2 


Kl 


M result 


srcl 


src2 


Yes 


No 


1101 


mm12tpm 


mm12tsm 


srcl 


src2 


T 


M result 


No 


No 


1110 


mim1p2 


mim1s2 


Kl 


M result 


srcl 


src2 


No 


No 


1111 


mm12tpm 


mm12tsm 


srcl 


src2 


T 


M result 


No 


No 



If K-load is set, KR is loaded when operand-1 of the multiplier is KR; Kl is loaded when operand-1 of the multiplier is Kl. 
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src2 
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rdest 
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src1 
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rdest 
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Figure 6-3. Data Paths by Instruction (1 of 8) 
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Figure 6-3. Data Paths by Instruction (2 of 8) 
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Figure 6-3. Data Paths by Instruction (3 of 8) 
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Figure 6-3. Data Paths by Instruction (4 of 8) 
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Figure 6-3. Data Paths by Instruction (5 of 8) 
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Figure 6-3. Data Paths by instruction (6 of 8) 
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Figure 6-3. Data Paths by Instruction (7 of 8) 
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Figure 6-3. Data Paths by Instruction (8 of 8) 
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Note that the mnemonics pfam.p, pfsm.p, pfmam.p, and pfmsm.p are never used as such in the 
assembly language; these mnemonics are used by this manual to designate classes of related 
instructions. Each value of DPC has a unique mnemonic associated with it. An initial "m" 
distinguishes the pfmam.p, and pfmsm.p classes from the pfam.p, and pfsm.p classes. Figure 
6-4 explains how the rest of these mnemonics are derived. 



Series 1 - Assumes the M-unit operand-2 \ssrc2 



M-unit 
op1 
{r,i} 



M-unit 

op2 

2 



A-unit 

op2 

(a, m, null) 



Add/ 

Subtract 

{p.s} 



A-unit 
op1 
{l.t} 



L 



T.loadK 

srcl 



subtract 
add (plus) 



M-result 
M-result, load T 
A-result, load T 



src2 



KI 
KR 



Series 2 - Assumes no K loading 
Not all combinations are possible. Refer to Table 6-1 for possible combinations. 



M-unit 

op1 and op2 

{ra, rm, ia, im, ml 2} 



loadT 
{t, null} 



A-unit 

opi 

(I, a, m, t} 



Add/ 

Subtract 

{p.s} 



A-unit 

op2 

(2, m, a} 



L 



A-result 
M-result 
srcl 



subtract 
add (plus) 



T 

M-result 
A-result 
srcl 



yes 



srcl , srcl 
KI, M-result 
KI, A-result 
KR, M-result 
KR, A-iesult 



Figure 6-4. Data Path Mnemonics 
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Programming Notes 

When the M-unit opl is srcl , srcl must not be the same as rdest. For best performance when the 
prior operation is scalar and M-unit opl is srcl , srcl should not be the same as the rdest of the 
prior operation. 

6.6 GRAPHICS UNIT 

The graphics unit operates on 32- and 64-bit integers stored in the floating-point register file. This 
unit supports long-integer arithmetic and 3-D graphics drawing algorithms. Operations are provided 
for pixel shading and for hidden surface elimination using a Z-buffer. 

Programming Notes 

In a pipelined graphics operation, if rdest is notfO, then rdest must not be the same as srcl or 
srcl . 

For best performance, the result of a scalar operation should not be a source operand in the next 
instruction, unless the next instruction is a multiplier or adder operation. 



6.6.1 Long-Integer Arithmetic 



fisub.w srcl , srcl, rdest 


(Long-Integer Subtract) 


rdest A — srcl — srcl 




pfisub.w srcl , srcl, rdest 


(Pipelined Long-Integer Subtract) 


rdest -4 — last-stage I-result 
last-stage I-result A — srcl — srcl 




fiadd.w srcl, srcl, rdest 


(Long-Integer Add) 


rdest -^ — srcl + srcl 




pfiadd.w srcl , srcl, rdest 


(Pipelined Long-Integer Add) 


rdest -^ — last-stage I-result 

last- stage I-result < — srcl + srcl 





.w = .ss (32 bits), or.dd (64 bits) 

The fiadd and fisub instructions implement arithmetic on integers up to 64 bits wide. Such 
integers are loaded into the same registers that are normally used for floating-point operations. 
These instructions do not set CC nor do they cause floating-point traps due to overflow. 
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Programming Notes 

In assembly language, fiadd and pfiadd are used to implement the fmov and pfmov 

pseudoinstructions . 



fmov.ss 


srcl , 


rdest 




(Single Move) 




Equivalent 


to fiadd. 


SS srcl , 


fO, rdest 




pfmov.ss 


srcl , 


ireg 




(Pipelined Single 


Move) 


Equivalent 


to pfiadc 


I.SS srcl 


,10, rdest 




fmov.dd 


srcl , 


rdest 




(Double Move) 




Equivalent 


to fiadd srcl , fO, 


rdest 






pfmov.dd 


srcl , 


ireg 




(Pipelined Double Move) 


Equivalent to pfiadc 


\ srcl , fO 


, rdest 







6.6.2 3-D Graphics Operations 

The i860 Microprocessor supports high-performance 3-D graphics applications by supplying 
operations that assist in the following common graphics functions: 

1 . Hidden surface elimination. 

2. Distance interpolation. 

3. 3-D shading using intensity interpolation. 

The interpolation operations of the i860 Microprocessor support graphics applications in which 
the set of points on the surface of a solid object is represented by polygons. The distances and 
color intensities of the vertices of the polygon are known, but the distances and intensities of other 
points must be calculated by interpolation between the known values. 

Certain fields of thepsr are used by the i860 Microprocessor's graphics instructions, as illustrated 
in Figure 6-5. 

The merge instructions are those that utilize the 64-bit MERGE register. The purpose of the 
MERGE register is to accumulate (or merge) the results of multiple-addition operations that use 
as operands the color-intensity values from pixels or distance values from a Z-buffer. The 
accumulated results can then be stored in one 64-bit operation. 

Two multiple-addition instructions and an OR instruction use the MERGE register. The addition 
instructions are designed to add interpolation values to each color-intensity field in an array of 
pixels or to each distance value in a Z-buffer. 
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31 



23 21 



PM 


PS 


/////////// 



PIXEL SIZE 
PIXEL MASK 



Figure 6-5. PSR Fields for Graphics Operations 
6.6.2.1 Z-BUFFER CHECK INSTRUCTIONS 



Consider PM as an array of eight bits PM(0)..PM(7), 

where PM(0) is the least-significant bit. 
fzchks srcl, src2, rdest (16-Bit Z-Buffer Check) 

Consider srcl , src2 , and rdest as arrays of four 16-bit fields srcl {0).. srcl {3), 

src2{Qi)..src2{y), and rdest(0)..rdest(3) where zero denotes the 

least-significant field. 
PM -4— PM shifted right by 4 bits 
FOR i = to 3 
DO 

PM [i + 4] M — src2(i) ^ srcl{i) (unsigned) 

rdestii) "4 — smaller oi src2(i) and5rc7(i) 
OD 

MERGE <— 
pfzchks srcl, src2, rdest (Pipelined 16-Bit Z-Buffer Check) 

Consider srcl , src2 , and rdest as arrays of four 16-bit fields srcl (0).. srcl (3), 

src2(0)..src2(3), and rdest(0)..rdest(3) where zero denotes the 

least-significant field. 
PM <— PM shifted right by 4 bits 
FOR i = to 3 
DO 

PM [i + 4] "4— src2(i) ^ srcl{i) (unsigned) 

rdest ■4— last-stage I-result 

last-stage I-result(i) 4 — smaller of 5'rc2(i) and srcl(i) 
OD 

MERGE 4— 
fzchkl srcl, src2, rdest (32-Bit Z-Buffer Check) 

Consider srcl , src2 , and rdest as arrays of two 32-bit fields srcl{Q)..srcl{\), 

src2(fd)..src2{\), and rdest{0).. rdestii) where zero denotes the 

least-significant field. 
PM <— PM shifted right by 2 bits 
FOR i = to 1 
DO 

PM [i + 6] -4 — src2i\) ^ srcl(i) (unsigned) 

rdestii) -4 — smaller of 5rc2(i) and srclii) 
OD 
MERGE 4— 
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pfzchkl srcl, src2, rdest (Pipelined 32-Bit Z-Buffer Check) 

Consider 5rc 7 , src2, and rdest as arrays of two 32-bit fields srcl{0)..srcl{\), 

srcl (Qi).. srcl {\), and rdest(0)..rdest{\) where zero denotes the 

least-significant field. 
PM Mr- PM shifted right by 2 bits 
FOR i = to 1 
DO 

PM [i + 6] -^ — srclQ) =^ srcliy) (unsigned) 

rdest ii) A — last- stage I-result 

last-stage I-result -^ — smaller of 5rc2(i) and srcl(\) 
OD 
MERGE M— 



A Z-buffer aids hidden-surface elimination by associating with a pixel a value that represents the 
distance of that pixel from the viewer. When painting a point at a specific pixel location, three- 
dimensional drawing algorithms calculate the distance of the point from the viewer. If the point is 
farther from the viewer than the point that is already represented by the pixel, the pixel is not 
updated. The i860 Microprocessor supports distance values that are either 16-bits or 32-bits wide. 
The size of the Z-buffer values is independent of the pixel size. Z-buffer element size is controlled 
by whether the 16-bit instruction fzchks or the 32-bit instruction fzchkl is used; pixel size is 
controlled by the PS field of the psr. 

The instructions fzchks and fzchkl perform multiple unsigned-integer (ordinal) comparisons. The 
inputs to the instructions fzchks and fzchkl are normally taken from two arrays of values, each of 
which typically represents the distance of a point from the viewer. One array contains distances 
that correspond to points that are to be drawn; the other contains distances that correspond to 
points that have already been drawn (a Z-buffer). The instructions compare the distances of the 
points to be drawn against the values in the Z-buffer and set bits of PM to indicate which distances 
are smaller than those in the Z-buffer. Previously calculated bits in PM are shifted right so that 
consecutive fzchks or fzchkl instructions accumulate their results in PM. Subsequent pst.d 
instructions use the bits of PM to determine which pixels to update. 

6.6.2.2 PIXEL ADD 



faddp srcl, srcl, rdest 


(Add with Pixel Merge) 




rdest A — srcl + srcl 

Shift and load MERGE register from srcl + srcl as defined in 


Table 6-2 


pfaddp srcl, srcl, rdest 


(Pipelined Add with Pixel Merge) 


rdest < — last-stage I-result 

last-stage I-result -^ — srcl + srcl 

Shift and load MERGE register from srcl + srcl as defined in 


Table 6-2 
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Thefaddp instruction implements interpolation of color intensities. The 8- and 16-bit pixel formats 
use 16-bit intensity interpolation. Being a 64-bit instruction, faddp does four 16-bit interpolations 
at a time. The 32-bit pixel formats use 32-bit intensity interpolation; consequently, is t tz performs 
them two at a time. By itself faddp implements linear interpolation; combined with fiadd, 
nonlinear interpolation can be achieved. 





Table 6-2. FADDP MERGE Update 




Pixel 

Size 

(from PS) 


Fields Loaded From 
Result into MERGE 


Right Shift 

Amount 
(Field Size) 


8 
16 
32 


63..56, 47..40, 31 ..24, 15..8 
63..58, 47..42, 31. .26, 15.. 10 
63..56, 31. .24 


8 
6 
8 



Figure 6-6 illustrates faddp when PS is set for 8-bit pixels. Since faddp adds 16-bit values in this 
case, each value can be treated as a fixed-point real number with an 8-bit integer portion and an 
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Figure 6-6. FADDP with 8-Bit Pixels 
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8-bit fractional portion. The real numbers are rounded to 8 bits by truncation when they are loaded 
into the MERGE register. With each faddp instruction, the MERGE register is shifted right by 8 
bits. Two faddp instructions should be executed consecutively, one to interpolate for even- 
numbered pixels, the next to interpolate for odd-numbered pixels. The shifting of the MERGE 
register has the effect of merging the results of the two faddp instructions. 

Figure 6-7 illustrates faddp when PS is set for 16-bit pixels. Since faddp adds 16-bit values in 
this case, each value can be treated as a fixed-point real number with an 6-bit integer portion and 
an 10-bit fractional portion. The real numbers are rounded to 6 bits by truncation when they are 
loaded into the MERGE register. With each faddp, the MERGE register is shifted right by 6 bits. 
Normally, three faddp instructions are executed consecutively, one for each color represented in 
a pixel. The shifting of MERGE causes the results of consecutive faddp instructions to be 
accumulated in the MERGE register. Note that each one of the first set of 6-bit values loaded into 
MERGE is further truncated to 4-bits when it is shifted to the extreme right of the 16-bit pixel. 
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Figure 6-7. FADDP with 16-Bit Pixels 
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Figure 6-8 illustrates faddp when PS is set for 32-bit pixels. Since faddp adds 32-bit values in 
this case, each value can be treated as a fixed-point real number with an 8-bit integer portion and 
an 24-bit fractional portion. The real numbers are rounded to 8 bits by truncation when they are 
loaded into the MERGE register. With each faddp, the MERGE register is shifted right by 8 bits. 
Normally, three faddp instructions are executed consecutively, one for each color represented in 
a pixel. The shifting of MERGE causes the results of consecutive faddp instructions to be 
accumulated in the MERGE register. 
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Figure 6-8. FADDP with 32-Bit Pixels 



6.6.2.3 Z-BUFFERADD 



The faddz instruction implements linear interpolation of distance values such as those that form a 
Z-buffer. With faddz, 16-bit Z-buffers can use 32-bit distance interpolation, as Figure 6-9 
illustrates. Since faddz adds 32-bit values, each value can be treated as a fixed- point real number 
with an 16-bit integer portion and a 16-bit fractional portion. The real numbers are rounded to 16 
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bits by truncation when they are loaded into the MERGE register. With each faddz , the MERGE 
register is shifted right by 16 bits. Normally, two faddz instructions are executed consecutively. 
The shifting of MERGE causes the results of consecutive faddz instructions to be accumulated in 
the MERGE register. 



faddz srcl , src2, rdest 


(Add with Z Merge) 


rdest -^ — srcl + src2 








Shift MERGE right 16 and load fields 31. 


.16 and 63. 


.48 


pfaddz srcl , srcl, rdest 


(Pipelined Add with Z Merge) 


rdest < — last-stage I-result 








last-stage I-result A — srcl + srcl 








Shift MERGE right 16 and load fields 31. 


.16 and 63. 


.48 from srcl + srcl 
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Figure 6-9. FADDZ with 16-Blt Z-Buffer 

6-29 



iny 



FLOATING-POINT INSTRUCTIONS 



32-bit Z-buffers can use 32-bit or 64-bit distance interpolation. For 32-bit interpolation, no special 
instructions are required. Two 32-bit adds can be performed as an 64-bit add instruction. The fact 
that data is carried from the low-order 32-bits into the high-order 32-bits may introduce an 
insignificant distortion into the interpolation. 

For 32-bit Z-buffers, 64-bit distance interpolation is implemented (as Figure 6-10 shows) with two 
64-bit fiadd instructions. The merging is implemented with the 32-bit movefmov.ss srcl , rdest. 
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Figure 6-10. 64-Blt Distance Interpolation 



6.6.2.4 OR WITH MERGE REGISTER 

For intensity interpolation, the form instruction fetches the partially completed pixels from the 
MERGE register, sets any additional bits that may be needed in the pixels (e.g. texture values), 
and loads the result into a floating point register. Srcl should contain zero. 
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For distance interpolation or for intensity interpolation that does not require further modification 
of the value in the MERGE register, the srcl operand of form may befO, thereby causing the 
instruction to simply load the MERGE register into a floating point register. 



form srcl , rdest 


(OR with MERGE Register) 


rdest <— srcl OR MERGE 




MERGE ^4— 




pform srcl , rdest 


(Pipelined OR with MERGE Register) 


rdest -4 — last-stage I-result 




last-stage I-result <— srcl OR MERGE | 


MERGE A— 





6.7 TRANSFER F-P TO INTEGER REGISTER 



fxfr srcl , ireg (Transfer F-P to Integer Register) 

ireg "4 — srcl 



The 32-bit floating-point register selected by srcl is stored into the (32-bit) integer register 
selected by ireg. Assemblers and compilers should ?,&Xsrc2 to zero. 

Programming Notes 

This scalar instruction is performed by the graphics unit. When it is executed, the result in the 
graphics-unit pipeline is lost. However, executing this instruction does not impact performance, 
even if the next instruction is a pipelined operation whose rdest is nonzero (refer to section 6.2). 

For best performance, ireg should not be referenced in the next instruction, and srcl should not 
reference the result of the prior instruction if the prior instruction is scalar. 

6.8 DUAL-INSTRUCTION MODE 

The i860 Microprocessor can execute a floating-point and a core instruction in parallel. Such 
parallel execution is called dual-instruction mode. When executing in dual-instruction mode, the 
instruction sequence consists of 64-bit aligned instructions with a floating-point instruction in the 
lower 32 bits and a core instruction in the upper 32 bits. 

Programmers specify dual-instruction mode either by including in the mnemonic of a floating- 
point instruction a d. prefix or by using the Assembler directives .dual ... enddual. Both of the 
specifications cause the D-bit of floating-point instructions to be set. If the i860 Microprocessor is 
executing in single-instruction mode and encounters a floating-point instruction with the D-bit set, 
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one more 32-bit instruction is executed before dual-mode execution begins. If the i860 
Microprocessor is executing in dual-instruction mode and a floating-point instruction is encountered 
with a clear D-bit, then one more pair of instructions is executed before resuming single- 
instruction mode. Figure 6-11 illustrates two variations of this sequence of events: one for 
extended sequences of dual-instructions and one for a single instruction pair. 
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Figure 6-11. Dual-Instruction Mode Transitions (1 of 2) 



When a 64-bit dual-instruction pair sequentially follows a delayed branch instruction in dual- 
instruction mode, both 32-bit instructions are executed. 

The recommended floating-point NOP for dual-instruction mode is shrd rO,rO,rO. Even though 
this is a core instruction, bit 9 is interpreted as the dual-instruction mode control bit. In assembly 
language, this instruction is specified asfnop ord.fnop. Traps are not reported onfnop. Because 
it is a core instruction, d.fnop cannot be used to initiate entry into dual- instruction mode. 

6.8.1 Core and Floating-Point Instruction Interaction 

1. If one of the branch-on-condition instructions be or bnc is paired with a floating-point 
compare, the branch tests the value of the condition code prior to the compare. 



6-32 



inteT 



FLOATING-POINT INSTRUCTIONS 









31 











op 


' ' 


d. fp-op 




63 


fp-op 




core-op 


fp-op 


Temporary Dual- 
Instruction Mode 








op 


' ' 




op 















Figure 6-11. Dual-Instruction Mode Transitions (2 of 2) 

2. If an ixfr, fid, or pfid loads the same register as a source operand in the floating-point 
instruction, the floating-point instruction references the register value before the load updates 
it. 

3. An fst or pst that stores a register that is the destination register of the companion pipelined 
floating-point operation will store the result of the companion operation. 

4. An fxfr instruction that transfers to a register referenced by the companion core instruction 
will update the register after the core instruction accesses the register. The destination of the 
core instruction will not be updated if it is any if the integer register. Likewise, if the core 
instruction uses autoincrement indexing, the index register will not be updated. 

5 . When the core instruction sets CC and the floating-point instruction is pfgt or pfeq , CC is set 
according to the result of the pfgt or pfeq . 

6.8.2 Dual-Instruction Mode Restrictions 

1 . The result of placing a core instruction in the low-order 32 bits or a floating-point instruction 
in the high-order 32 bits is not defined (except for shrd rO, rO, rO which is interpreted 
asfnop). 
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2. A floating-point instruction that has the D-bit set must be aUgned on a 64-bit boundary (i.e. 
the three least-significant bits of its address niust be zero). This applies as well to the initial 
32-bit floating-point instruction that triggers the transition into dual-instruction mode, but 
does not apply to the following instruction. 

3. When the floating-point operation is scalar and the core operation is fst or pst, the store 
should not reference the result register of the floating-point operation. When the core 
operation is pst , the floating-point instruction cannot be (p)fzchks or (p)fzchkt . 

4. When the core instruction of a dual-mode pair is a control-transfer operation and the previous 
instruction had the D-bit set, the floating-point instruction must also have the D-bit set. In 
other words, an exit frorn dual-instruction mode cannot be initiated (first instruction pair 
without D-bit set) when the core instruction is a control-transfer instruction. 

5. When the core operation is ald.c orst.c, the floating-point operation must bed.fnop. 

6 . When the floating-point operation is f xf r , the core instruction cannot be Id , Id.c , st , st.c , call , 

Ixfr, or any instruction that updates an integer register (including autoincrement indexing). 

7. In dual-instruction mode when the core instruction is an indirect branch, the psr trap bits 
cannot be set. 

8. When the core operation is bet or bnc.t, the floating point operation cannot be pfeq orpfgt. 
The floating point operation in the sequentially following instruction pair cannot be pfeq 
orpfgt, either. 

9. A transition to or from dual- instruction mode cannot be initiated on the instruction following 
a bri . 
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Chapter 7 
Traps and Interrupts 



Traps are caused by exceptional conditions detected in programs or by external interrupts. Traps 
cause interruption of normal program flow to execute a special program known as a trap handler. 

7.1 TYPES OF TRAPS 

Traps are divided into the types shown in Table 7-1 

Table 7-1 . Types of Traps 



Type 


Indi 
PSR 


nation 
FSR 


Caused by 
Condition Instruction 


Instruction 
Fault 


IT 




Software traps 
Missing unlock 


trap, intovr 

Any 


Floating 
Point 
Fault 


FT 


SE 

AO, MO 
AU, MU 
Al, MU 


Floating-point source exception 
Floating-point result exception 

overflow 

underflow 

inexact result 


Any M- or A-unit except fmlow 
Any M- or A-unit except fmlow, pfgt, 
and pfeq. Reported on any F-P 
instruction plus pst, fst, and 
sometimes fid, pfid, ixfr 


Instruction 
Access Fault 


lAT 




Address translation exception 
during instruction fetch 


Any 


Data Access 
Fault 


DAT* 




Load/store address translation 

exception 
Misaligned operand address 
Operand address matches 

db register 


Any load/store 
Any load/store 
Any load/store 


Interrupt 


IN 




External interrupt 


Reset 


No trap bits set 


Hardware RESET signal 



* These cases can be distinguished by examining the operand addresses. 

7.2 TRAP HANDLER INVOCATION 

This section applies to traps other than reset. When a trap occurs, execution of the current 
instruction is aborted. The instruction is restartable as described in section 7.2.2. The processor 
takes the following steps while transferring control to the trap handler: 

1. Copies U (user mode) of thepsr into PU (previous U). 

2. Copies IM (interrupt mode) into PIM (previous IM). 

3. Sets U to zero (supervisor mode). 
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4. Sets IM to zero (interrupts disabled). This guards against further interrupts until the trap 
information can be saved. 

5. If the processor is in dual instruction mode, it sets DIM; otherwise DIM is cleared. 

6. If the processor is in single-instruction mode and the next instruction will be executed in 
dual-instruction mode or if the processor is in dual-instruction mode and the next instruction 
will be executed in single-instruction mode, DS is set; otherwise, it is cleared. 

7. The appropriate trap type bits inpsr andepsr are set (IT, IN, lAT, DAT, FT, IL). Several 
bits may be set if the corresponding trap conditions occur simultaneously. 

8. An address is placed in the fault instruction register (fir) to help locate the trapped instruction. 
In single-instruction mode, the address in fir is the address of the trapped instruction itself. In 
dual-instruction mode, the address in fir is that of the floating-point half of the dual 
instruction. If an instruction- or data-access fault occurred, the associated core instruction is 
the high-order half of the dual instruction (fir -I- 4). In dual-instruction mode, when a data- 
access fault occurs in the absence of other trap conditions, the floating-point half of the dual 
instruction will already have been executed (except in the case of the fxfr instruction). 

The processor begins executing the trap handler by transferring execution to virtual address 
OxFFFFFFOO. The trap handler begins execution in single-instruction mode. The trap handler 
must examine the trap-type bits in psr (IT, IN, I AT, DAT, FT) and epsr (IL) to determine the 
cause or causes of the trap. 

7.2.1 Saving State 

To support nesting of traps, the trap handler must save the current state before another trap occurs. 
An interrupt stack can be implemented in software (refer to the section on stack implementation 
in Chapter 8). Interrupts can then be reenabled by clearing the trap-type bits and setting IM to the 
value of PIM. The branch-indirect instruction is sensitive to the trap-type bits; therefore, clearing 
the trap-type bits allows normal indirect branches to be performed within the trap handler. 

The items that make up the current state may include any of the following: 

1. The fir. 

2. The psr. 

3. The epsr. 

4. Thefsr. 

5. The MERGE register. 

6. The KR, KI, and T registers. 

7. Any of the four pipelines (refer to section 7.9). 

8. The floating-point and integer register files. 

9. The dirbase register. 
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7.2.2 Returning from the Trap Handler 

Returning from a trap handler involves the following steps: 

1. Restoring the pipeline states, including the fsr, KR, KI, T, and MERGE registers, where 
necessary. 

2. Subtracting srcl from s)-c2 , when a data-access fault occurred on an autoincrementing load/ 
store instruction and a floating-point trap did not also occur. 

3. Deteimining where to resume execution by inspecting the instruction at fir - 4. The details 
for this determination are given in section 7.2.2.1. 

4. Updating psr with the value to be used after return. It may be necessary to set the KNF bit 
in psr. The requirements for KNF are given in section 7.2.2.2. 

5. Restoring the integer and floating-point register files (except for the register that holds the 
resumption address). 

6. Executing an indirect branch to the resumption address. Neither the indirect branch nor the 
following instruction may be executed in dual-instruction mode. 

7. Restoring the register that holds the resumption address. (This is executed before the delayed 
indirect branch is completed.) 

7.2.2.1 DETERMINING WHERE TO RESUME 

To determine where to resume execution upon leaving the trap handler, examine the instruction at 
address fir — 4. If this instruction is not a delayed control instruction, then execution resumes at 
the address in fir. 

If, on the other hand, the instruction at fir — 4 is a delayed control instruction (i.e. one that 
executes the next sequential instruction on branch taken), the normal action is to resume at fir — 
4 so that the control instruction (which did not finish because of the trap) is also reexecuted. If 
the instruction at fir — 4 is a bla instruction, then srcl should be subtracted from src2 before 
reexecuting. 

The one variance from this strategy occurs when the instruction at fir — 4 is a conditional delayed 
branch (bet orbnc.t), the instruction at fir is apfgt, pfle, orpfeq, and a source exception has 
occurred. To implement the IEEE standard for unordered compares, the trap handler may need to 
change the value of CC. In this case it cannot resume at fir — 4, because the new value of CC 
might cause an incorrect branch. Instead, the trap handler must interpret the conditional branch 
instruction and resume at its target. 

If the i860 Microprocessor was in dual-instruction mode and execution is to resume at fir — 4, 
DS should be set and DIM cleared in the psr used to resume execution. Clearing DIM prevents 
the floating-point instruction associated with the control instruction from being reexecuted. Setting 
DS forces the processor back to dual-instruction mode after executing the control instruction. 
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Every code section should begin with a nop instruction so that fir — 4 is defined even in case a 
trap occurs on the first real instruction. Also, that nop should not be the target of any branch or 
call. 



7.2.2.2 SETTING KNF 

The KNF bit of psr should be set if the trapped instruction is a floating-point instruction that 
should not be reexecuted; otherwise, KNF is left unchanged. Floating-point instructions should 
not be reexecuted under the following conditions: 

• The trap was caused in dual-instruction mode by a data-access fault and there are no other 
trap conditions. In this case, the the floating-point instruction has already been executed. 
(The one exception is thefxfr instruction. Anfxfr must be reexecuted; so do not set KNF). 

• The trap was caused by a source exception on any floating-point instruction (except when 
apfgt, pfle, orpfeq follows a conditional branch, as already explained in section 7.2.2.1). 
The trap handler determines the result that corresponds to the exceptional inputs; therefore, 
the instruction should not be reexecuted. 

7.3 INSTRUCTION FAULT 

This fault is caused by any of the following conditions. In all cases the processor sets the IT bit 
before entering the trap handler. 

• By the trap instruction. Refer to the trap instruction in Chapter 5. 

• By the intovr instruction. The trap occurs only if OF in epsr is set when Intovr is executed. 
The trap handler should clear OF before returning. Refer to the intovr instruction in 
Chapter 5. 

• By the lack of an unlocic instruction and a subsequent load or store within 32 instructions of 
alocl<. In this case IL is also set. When the trap handler finds IL set, it should scan backwards 
for the lock instruction and restart at that point. The absence of a lock instruction within 32 
instructions of the trap indicates a programming error. Refer to the lock instruction in 
Chapter 5. 

7.4 FLOATING-POINT FAULT 

The floating-point faults of the i860 Microprocessor support the floating-point exceptions defined 
by the IEEE standard as well as some other useful classes of exceptions. The i860 Microprocessor 
divides these into two classes: 

1 . Source exceptions. This class includes: 

• All the invalid operations defined by the IEEE standard (including operations on trapping 

NaNs). 

• Division by zero. 
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• Operations on quiet NaNs, denormals and infinities. (These data types are implemented 
by software.) 

2. Result exceptions. This class includes the overflow, underflow, and inexact exceptions 
defined by the IEEE standard. 

The floating-point fault occurs only on floating-point instructions, pst, fst, fid, pfid, and ixfr. 
However, no fault occurs when pst, fst, fid, pfId, or ixfr transfers an invalid floating-point format. 

Software supplied by Intel provides the IEEE standard default handling for all these exceptions. 

7.4.1 Source Exception Faults 

When used as inputs to the floating-point adder or multiplier, all exceptional operands (including 
infinities, denormalized numbers and NaNs) cause a floating-point fault and set SE in the fsr. 
Source exceptions are reported on the instruction that initiates the operation. For pipelined 
operations, the pipeline is not advanced. The trap handler can reference both source operands and 
the operation by decoding the instruction specified by fir. 

In the case of dual operations, the trap handler has to determine which special registers the source 
operands are stored in and inspect all four source operands to see if one or both operations need 
to be fixed up. It can then compute the appropriate result and store the result in rdest, in the case 
of a scalar operation, or replace the appropriate first-stage result, in the case of a pipelined 
operation. 

Note that, in the following case, inappropriate use of the FTE bit of the fsr can produce an invalid 
operand that does not cause a source exception: 

1 . Floating-point traps are masked by clearing the FTE bit. 

2. An dual-operation instruction causes underflow or overflow leaving an invalid result in the T 
register. 

3. Floating-point traps are enabled by setting the FTE bit. 

4. The invalid result in the T register is used as an operand of a subsequent instruction. 

Even though the result of an operation would normally cause a source exception, it can be inserted 
into the pipeline as follows: 

1 . Disable traps by clearing FTE. 

2. Perform a pipelined add of the value with zero or a multiply by one. 

3. Set the result-status bits of fsr to "normal" by loading fsr with the U-bit set and zeros in the 
appropriate unit's result-status bits. The other unit's status must be set to the saved status for 
the first pipeline stage. 

4. Reenable traps by setting FTE. 
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5. Set KNF in the psr to avoid reexecuting the instruction. 

The trap handler should ignore the SE bit for faults on fid, pfid, fst, pst, and Ixfr instructions 
when in single-instruction mode or when in dual-instruction mode and the companion instruction 
is not a multiplier or adder operation. The SE value is undefined in this case. 

The trap handler should process result exceptions as described below and reexecute the instruction 
before processing source exceptions. 

7.4.2 Result Exception Faults 

The class of result exceptions includes any of the following conditions: 

• Overflow. The absolute value of the rounded true result would exceed the largest finite 
number in the destination format. 

• Underflow (when FZ is clear). The absolute value of the rounded true result would be smaller 
than the smallest finite number in the destination format. 

• Inexact result (when TI is set). The result is not exactly representable in the destination 
format. For example, the fraction 1/3 cannot be precisely represented in binary form. This 
exception occurs frequently and indicates that some (generally acceptable) accuracy has been 
lost. 

The point at which a result exception is reported depends upon whether pipelined operations are 
being used: 

• Scalar (nonpipelined) operations. Result exceptions are reported on the next floating- 
point, fst.x, or pst.x (and sometimes fid, pfid, ixfr) instruction after the scalar operation. 
When a trap occurs, the last-stage of the affected unit contains the result of the scalar 
operation. 

• Pipelined operations. Result exceptions are reported when the result is in the last stage and 
the next floating-point, fst.x, or pst.x (and sometimes fid, pfid, ixfr) instruction is executed. 
When a trap occurs, the pipeline is not advanced, and the last-stage results (that caused the 
trap) remain unchanged. 

When no trap occurs (either because FTE is clear or because no exception occurred), the pipeline 
is advanced normally by the new floating-point operation. The result-status bits of the affected 
unit are undefined until the point that result exceptions are reported. At this point, the last-stage 
result- status bits (bits 29.. 22 and 16.. 9 of thefsr) reflect the values in the last stages of both the 
adder and multiplier. For example, if the last-stage result in the multiplier has overflowed and a 
pipelined floating-point pfadd is started, a trap occurs and MO is set. 

For scalar operations, the RR bits of fsr specify the register in which the result was stored. RR is 
updated when the scalar instruction is initiated. The trap, however, occurs on a subsequent 
instruction. Programmers must prevent intervening stores to fsr from modifying the RR bits. 
Prevention may take one of the following forms: 



7-6 



inteT 



TRAPS AND INTERRUPTS 



• Before any store to fsr when a result exception may be pending, execute a dummy floating- 
point operation to trigger the result-exception trap. 

• Always read from fsr before storing to it, and mask updates so that the RR, RM, and FZ bits 
are not changed. 

For pipelined operations, RR is cleared; the result is in the pipeline of the appropriate unit. 

In either case, the result has the same fraction as the true result and has an exponent which is the 
low-order bits of the true result. The trap handler can inspect the result, compute the result 
appropriate for that instruction (a NaN or an infinity, for example), and store the correct result. 
The result is either stored in the register specified by RR (if nonzero) or in the last stage of the 
pipeline (if RR = 0). The trap handler must clear the result status for the last stage, then reexecute 
the trapping instruction. 

Result exceptions may be reported for both the adder and multiplier units at the same time. In this 
case, the trap handler should fix up the last stage of both pipelines. 

7.5 INSTRUCTION-ACCESS FAULT 

This trap results from a page-not-present exception during instruction fetch. If a supervisor-level 
page is fetched in user mode, an exception may or may not occur. 

7.6 DATA-ACCESS FAULT 

This trap results from an abnormal condition detected during data operand fetch or store. Such an 
exception can be due to one of the following causes: 

• An attempt is being made to write to a page whose D-bit is clear. 

• A memory operand is misaligned (is not located at an address that is a multiple of the length 
of the data). 

• The address stored in the debug register is equal to one of the addresses spanned by the 
operand. 

• The operand is in a not-present page. 

• An attempt is being made from user level to write to a read-only page or to access a 
supervisor- level page. 

7.7 INTERRUPT TRAP 

An interrupt is an event that is signaled from an external source. If the processor is executing with 
interrupts enabled (IM set in the psr), the processor sets the interrupt bit IN in the psr, and 
generates an interrupt trap. Vectored interrupts are implemented by interrupt controllers and 
software. 
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7.8 RESET TRAP 

When the i860 Microprocessor is reset, execution begins in single-instruction mode at address 
OxFFFFFFOO. This is the same address as for other traps. The reset trap can be distinguished from 
other traps by the fact that no trap bits are set. The instruction cache is flushed. The bits DPS, 
BL, and ATE in dirbase are cleared. CSS is initialized by the value at the INT pin just before the 
end of RESET. The read-only fields of the epsr are set to identify the processor, while the IL, 
WP, and PBM bits are cleared. The bits U, IM, BR, and BW in psr are cleared. All other bits of 
psr and all other register contents are undeHned. 

The software must ensure that the data cache is flushed (refer to Chapter 4) and control registers 
are properly initialized before performing operations that depend on the values of the cache or 
registers. The fir must be initialized with a Id.c fir, rO instruction. 

Reset code must initialize the floating-point pipeline states to zero, using dummy pfadd, pfmul, 
pfiadd instructions. Floating-point traps must disabled to ensure that no spurious floating-point 
traps are generated. 

After a RESET the i860 Microprocessor starts execution at supervisor level (U=0). Before 
branching to the first user-level instruction, the RESET trap handler or subsequent initialization 
code has to set PU and a trap bit so that an indirect branch instruction will copy PU to U, thereby 
changing to user level. 



7.9 PIPELINE PREEMPTION 

Each of the four pipelines (adder, multiplier, load, graphics) contains state information. The 

pipeline state must be saved when a process is preempted or when a trap handler performs 

pipelined operations using the same pipeline. The state must be restored when resuming the 
interrupted code. 

7.9.1 Floating-Point Pipelines 

The floating-point pipeline state consists of the following items; 

1 . The current contents of the floating-point status register fsr (including the third-stage result 
status). 

2. Unstored results from the first, second, and third stages. The number of stages that exist in 
the multiplier pipeline depends on the sizes of the operands that occupy the pipeline. The 
MRP bit of fsr helps determine how many stages are in the multiplier pipeline. 

3. The result-status bits for the first two stages. 

4. The contents of the KR, KI, and T registers. 
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7.9.2 Load Pipeline 

The pipeline state for pfid instructions can be saved by performing tiiree pfid instructions to a 
dummy address. Thus the pipeline is advanced three stages, causing the last three real operands 
to be stored from the pipeline into registers that are then saved in some memory area. The size of 
each saved value is indicated by the value of the LRP bit of thefsr. 

The load pipeline can be restored performing three pfid instructions using the memory addresses 
of the saved values. The pipeline will then contain the same three values it held before the 
preemption. 

7.9.3 Graphics Pipeline 

The graphics pipeline has only one stage. To flush the pipeline, execute a pfiadd fO, fO, rdest. 
The only other state information for the graphics unit resides in the PM bits of psr, the IRP bit of 
thefsr, and in the MERGE register. Store the MERGE register with a form instruction. Restore 
the MERGE register by using faddz instructions (see Example 7-2). 

7.9.4 Examples of Pipeline Preemption 

Example 7-1 shows how to save the pipeline state. 

Example 7-2 shows how to restore the pipeline state. Trap handlers manipulate the result-status 
bits in the floating-point pipelines while preparing for pipeline resumption. When storing to fsr 
with the U-bit set, the result-status bits are loaded into the first stage of the pipelines of the 
floating-point adder and multiplier. The updated result-status bits of a particular unit (multiplier 
or adder) are propagated one stage for each pipelined floating-point operation for that unit. When 
they reach the last stage, they override the normal result-status bits computed from the last-stage 
result. The result-status bits in the fsr always reflect the last-stage result status and cannot be 
directly set by software. 
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// The symbols Mres3, Ares3, Mres2, Ares2, Mresl, Aresl, 

// Iresl, Lres, KR, KI , and T refer to 64-bit FP registers. 

// The symbols Fsr3, Fsr2, Fsrl, Mergelo32 , Mergehi32, and Temp 

// refer to integer registers. 

// The symbols Lres3m, Lres2m, and Lreslra refer to memory locations. 

// The symbol Dummy represents an addressing mode that refers to some 

// readable location that is always present (e.g. 0(rO)). 

results 

get double-precision 1.0 

save third stage result status 

clear FTE bit 

disable FP traps 

save third stage M result 

save third stage A result 

save third stage pfld result 

... in memory 
save second stage result status 
save second stage M result 
save second stage A result 
save second stage pfld result 

... in memory 
save first stage result status 
save first stage M result 
save first stage A result 
save first stage pfld result 

... in memory 
save vector- integer result 

M first stage contains KR 
A first stage contains T 
M first stage contains KI 
Save KR register 
Save KI register 
adder third stage gets T 
save T-register 
save MERGE register 



// Save third, 


second. 


and first stage 


fld.d 


DoubOne , f 4 // 


Id.c 


fsr. 


Fsr3 // 


andnot 


0x20, 


Fsr3, Temp // 


st.c 


Temp, 


fsr // 


pfmul.ss 


fo, 


fO, Mres3 // 


pfadd.ss 


fO, 


fO, Ares3 // 


pfld.d 


Dummy, 


Lres // 


fst.d 


Lres, 


Lres3ra // 


Id.c 


fsr. 


Fsr2 // 


pfmul.ss 


fO, 


fO, Mres2 // 


pfadd. ss 


fO, 


fO, Ares2 // 


pfld.d 


Dummy, 


Lres // 


fst.d 


Lres, 


Lres2m // 


Id.c 


fsr. 


Fsrl // 


pfmul . ss 


fO, 


fO, Mresl // 


pfadd. ss 


fo. 


fO, Aresl // 


pfld.d 


Dummy, 


Lres // 


fst.d 


Lres, 


Lreslm // 


pfiadd.dd 


fO, 


fO, Iresl // 


// Save KR, KI, 


, T, and 


MERGE 


r2apt.dd 


fO, 


f 4 , f // 
// 


i2pl.dd 


fO, 


f 4 , f // 


pfmul.dd 


fO, 


f , KR // 


pfmul.dd 


fo. 


f , KI // 


pfadd. dd 


fO, 


f , f // 


pfadd. dd 


fO, 


f , T // 


form 


fO, 


f2 // 


fxfr 


f2, 


Mergelo32 


fxfr 


f3. 


Mergehi32 



Example 7-1. Saving Pipeline States 
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// The symbols Mres3, Ares3, Mres2, Ares2, Mresl, Aresl, 

// Iresl, KR, KI , and T refer to 64-bit FP registers. 

// The symbols Fsr3, Fsr2, Fsrl, Mergelo32, Mergehi32, and Temp 

// refer to integer registers . 

// The symbols Lres3m, Lres2m, and Lreslm refer to memory locations, 



St. c 
// Restore MERGE 



rO, fsr // clear FTE 





shl 




16, Mergelo32, 


rl 




ixfr 




rl, f2 






shl 




16, Mergehi32, 


rl 




ixfr 




rl, f3 






ixfr 




Mergelo32, 


f4 




ixfr 




Mergehi32 , 


f5 




faddz 




f , f 2 , 


fO 




faddz 




f , f 4 , 


fO 


// 


Restore 


KR, 


KI, and T 






fld.l 




S ingOne , 


f2 




fld.d 




DoubOne , 


f4 




pfmul.( 


dd 


f4, T, 


fO 




r2pt.dd 


KR, fO, 


fO 




i2apt.( 


dd 


KI , f , 


fO 


// 


Restore 


3rd 


stage 






andh 




0x2000, Fsr3, 


rO 




bet 




LO 






pfadd. ; 


ss 


Ares3, f , 


fO 




pfadd.i 


dd 


Ares 3, fO, 


fO 


LO: 


orh 




ha%Lres3m, rO , 


r31 




andh 




0x400, Fsr3, 


rO 




bet 




LI 






pfld.l 




l%Lres3m(r31) , 


fO 




pfld.d 




l%Lres3m(r31), 


fO 


LI: 


andh 
bet 




0x1000, Fsr3, 
L2 


rO 




pfmul . ; 


ss 


Mres3, f 2 , 


fO 




pfmul3 


.dd 


Mres3, f4. 


fO 


L2: 


or 




0x10, Fsr3, 


Temp 




andnot 




0x20, Temp, 


Temp 




St. c 




Temp , fsr 





rl // move low 16 bits to high 16 
rl // move low 16 bits to high 16 



// merge low 16s 
// merge high 16s 

// get single-precision 1.0 
// get double-precision 1.0 
// put value of T in M 1st stage 
// load KR, advance t 
// load KI and T 

// test adder result precision ARP 
// taken if it was single 
// insert single result 
// insert double result 

// test load result precision LRP 

// taken if it was single 

// insert single result 

// insert double result 

// test multiplier result precision MRP 

// taken if it was single 

// insert single result 

// insert double result 

// set U (update) bit so that st.c 

// will update status bits in pipeline 

// clear FTE bit so as not to cause traps 

// update stage 3 result status 



Example 7-2. Restoring Pipeline States (1 of 2) 
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// 


Restore 2nd 


stage 






andh 


0x2000, Fsr2, 


rO 




bet 


L3 






pfadd.ss 


Ares 2, fO, 


fO 




pfadd.dd 


Ares2, fO, 


fO 


L3: 


orh 


ha%Lres2m, rO, 


r31 




andh 


0x400, Fsr2, 


rO 




bet 


L4 






pfld.l 


l%Lres2m(r31) , 


fO 




pfld.d 


l%Lres2m(r31) , 


fO 


L4: 


or 


0x10, Fsr2, 


Temp 




andnot 


0x20 , Temp , 


Temp 




andh 


0x1000, Fsr2, 


rO 




bet 


L5 






pfmul.ss 


Mres2, f 2 , 


fO 




pfmul3.dd 


Mres2, f4, 


fO 


L5: 


St. c 


Temp, fsr 




// 


Restore 1st 


stage 






andh 


0x1000, Fsrl, 


rO 




bet 


L6 






pfmul.ss 


Mresl, f2. 


fO 




pfmul3 . dd 


Mresl, f4, 


fO 


L6: 


andh 


0x2000, Fsrl, 


rO 




bet 


L7 






pfadd. ss 


Aresl, fO, 


fO 




pfadd. dd 


Aresl, fO, 


fO 


L7: 


orh 


ha%Lreslm, rO , 


r31 




andh 


0x400, Fsrl, 


rO 




bet 


L8 






pfld.l 


l%Lreslm(r31), 


fO 




pfld.d 


l%Lreslm(r31) , 


fO 


L8: 


andh 


0x800, Fsrl, 


rO 




bet 


L9 






pfiadd.ss 


fO, Iresl, fO 






pfiadd.dd 


fO, Iresl, fO 




L9: 


or 


0x10, Fsrl, 


Fsrl 




st .c 


Fsrl, fsr 






St .c 


Fsr3, fsr 





// test adder result precision ARE 
// taken if it was single 
// insert single result 
// insert double result 

// test load result precision LRP 

// taken if it was single 

// insert single result 

// insert double result - 

// set update bit 

// clear FTE 

// test multiplier result precision MRP 

// taken if it was single 

// insert single result 

// insert double result 

// update stage 2 result status 

// test multiplier result precision MRP 

// skip next if double 

// insert single result 

// insert double result 

// test adder result precision ARP 

// taken if it was single 

// insert single result 

// insert double result 

// test load result precision LRP 

// taken if it was single 

// insert single result 

// insert double result 

// test vector- integer result precision IRP 

// taken if it was single 

// insert single result 

// insert double result 

// set U (update) bit 

// update stage 1 result status 

// restore nonpipelined FSR status 



Example 7-2. Restoring Pipeline States (2 of 2) 
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This chapter defines standards for the use of certain aspects of the architecture of the i860 
Microprocessor. These standards must be followed to guarantee that compilers, applications 
programs, and operating systems written by different people and organizations will work together. 

8.1 REGISTER ASSIGNMENT 

Table 8-1 defines the standard for register allocation. Figure 8-1 presents the same information 
graphically. 

Table 8-1. Register Allocation 



Register 


Purpose 


Left Unchanged 
by a Subroutine? 


rO 

r1 

r2 

rS 

r4-r15 

r16-r27 

r16 
r28-r30 

r31 


Always zero 

Return address 

Stack pointer 

Frame pointer 

Local values 

Parameters and temporaries 

Return value 

Temporaries 

Addressing temporary 


Yes 

Yes 

Note^ 

Yes 

Yes 

No 

No 

No 

No 


fO-f1 
f2-f15 
f16-f27 
f16-f17 
f28-f31 


Always zero 

Local values 

Parameters and temporaries 

Return value 

Temporaries 


Yes 
No 
No 
No 



^ The stack pointer is normally kept unchanged across a subroutine call. However, some subroutines may allocate stack space 
and return with a different value in r2. 

NOTE 

The dividing point between locals and parameters and return value in the floating- 
point registers is not yet firm. For the purpose of illustration, the dividing point is 
shown at fl6, but this may change to f8. 



8.1.1 Integer Registers 

up to 12 parameters can be passed in the integer registers. The first (leftmost) parameter is passed 
in r16 (if it is an integer), the rest in successively higher-numbered registers. If fewer parameters 
are required, the remaining registers can be used for temporary variables. If more than 12 
parameters are required, the overflow can be passed in memory on the stack. 



8-1 



inteT 



PROGRAMMING MODEL 



Register r16 is both a parameter register and a return value. If a subroutine has an integer return 
value, the value is put into r16 before control is returned to the caller. 

Register r1 is the required return-address register, because the call instruction uses it to save the 
return address. Subroutines are therefore required to use r1 to return to the caller. If a subroutine 
saves r1 , it may then use it as a temporary until it returns. 

A separate addressing temporary register (r31 ) is allocated to allow construction of 32-bit absolute- 
address temporaries. The assembler uses r31 by default to construct 32-bit absolute addresses 
from 16-bit literals. 





INTEGER 
32 




FLOATING-POINT 
64 


fO 

f2 

f4 

f6 

f8 

flO 

f12 

f14 

tie 

t18 
f20 
f22 
f24 
f26 
f28 
f30 




ZERO 


rO 
r1 
r2 
r3 
r4 
r5 
r6 
r7 
r8 
r9 

no 

r11 
r12 
r13 
r14 
r15 
r16 
r17 
r18 
r19 
r20 
r21 
r22 
r23 
r24 
r25 
r26 
r27 
r28 
r29 
r30 
r31 


ZERO 1 


RETURN ADDRESS 


i 1 


STACK POINTER 






FRAME POINTER 




1 


i 


L 


LOCALS 1 
















1 






1 






1 


LOCALS 1 




1 






PARAMETERS | 
















1 






1' 








' 


' 




' 


I 
























PARAMETERS I 


















' 


' 




1 


TEMPORARIES 


t 


ADDRESS TEMP. 











Figure 8-1. Register Allocation 
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8.1.2 Floating-Point Registers 

Floating-point and 64-bit integer values in the floating-point registers must use f16-f27 when 
passed by value. The leftmost parameter is passed in f17-f16 (if it is floating-point); the rest in 
successively higher-numbered registers. Single-precision parameters use two registers, just as do 
double-precision parameters. The single-precision value must be in the even-numbered register; 
the corresponding odd-numbered register is left unused in this case. A single-precision floating- 
point value can be converted to double-precision with the f mov.sd /x, fy pseudoinstmction. 

Parameters beyond f26-f27 are passed in memory on the stack. The last (i.e. rightmost) parameter 
is at the highest stack address (i.e is pushed first assuming a grow-down stack). The same registers 
used to pass the first parameter are used for the return value when the return value is a floating- 
point value or 64-bit integer. A subroutine may need to save the first parameter to make room for 
the return value. 

8.1.3 Passing l\/lixed Integer and Floating-Point Parameters in Registers 

If parameter N is an integer parameter, then it is placed in integer register 16 + N, and the 
double-precision register at 16 + 2N is available for use as a local variable. If parameter M is a 
floating-point parameter, then it is placed in the floating-point register pair at 16 + 2M, and the 
integer register 16 -t- M is available for use as a local variable. 

NOTE 

This convention remains tentative. It may change to allow all integer and floating 
parameter registers to contain parameter values. 

8.1.4 Variable Length Parameter Lists 

Parameter passing in registers can handle variable parameters. UNIX* System V uses a special 
method to access variable-count parameters. The varargs.h file defines several functions to get at 
these parameters in a way that is independent of stack growth direction and of whether parameters 

are passed in registers or on the stack. A subroutine with variable parameters calls va start to 

force them onto the stack before they can be used. The routine va start must be called at the 

beginning of a subroutine. This method works with current C standards. 

8.2 DATA ALIGNMENT 

Compilers and assemblers must do their best to keep data aligned. It is acceptable to have holes 
in data structures to keep all items aligned. In some cases (e.g. FORTRAN programs with 
overlaid data), it is necessary to have misaligned data. A run-time trap handler can be provided to 
handle misaligned data; however, such data would impose a performance penalty on the 
application. If a compiler must reference data that is misaligned, the compiler must generate 
separate instructions to access the data in smaller units that will not generate misaligned-data 
traps. Accessing 16-bit misaligned data requires two byte loads plus a shift. Storing to 32-bit 
misaligned data requires four byte stores and three shifts. The code example in Example 8-1 is 
the recommended method for reading a misaligned 32-bit value whose address is in r8 . 
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andnot 

Id.l 

Id.l 

and 

shl 

shr 

shrd 



3, 

0(r9) 

4(r9) 

3, 

3, 

r9, 

rll, 



r8, 
rlO 
rll 
r8, 
r9, 
rO, 
rlO, 



r9 



r9 
r9 
rO 
r9 



// Get address aligned on 4-byte boundary 

// Get low 32 -bit value 

// Get high value 32 -bit 

// Get byte offset in 8-byte field 

// Convert to bit offset 

// Set shift count 

// Put 32 -bit value into R9 



// If the misalignment offset (m) is known in advance, this code can be 
// optimized. Assume r8 points to next aligned address less than address 
of misaligned field. 



// 



Id.l 
Id.l 
shr 
shrd 



0(r8), rlO // Get low value 

4(r8), rll // Get high value 

m*8, rO, rO // Set shift count 

rll, rlO, r9 // Put 32 -bit value into R9 

Example 8-1. Reading Misaligned 32-Bit Value 



8.3 IMPLEMENTING A STACK 

In general, compilers and programmers have to maintain a software stack. Register r2 (called sp 
in assembly language) is the suggested stack pointer. Register r2 is set by the operating system 
for the application when the program is started. The stack must be a grow-down stack, so as to 
be compatible with that of the ^^386^*^. If a subroutine call requires placing parameters on the 
stack, then the caller is responsible for adjusting the stack pointer upon return. The caller must 
also allocate space on the stack for the overflow parameters (i.e. parameters that exceed the 
capacity of the registers reserved for passing parameters) and store them there directly for the call 
operation. 

A separate frame pointer is used because C allows calls to subroutines that change the stack 

pointer to allocate space on the stack at run-time (e.g. alloca and va start). Other languages 

may also return values from a subroutine allocated on stack space below the original top-of-stack 
pointer. Such a subroutine prevents the caller from using r2 -relative addressing to get at values on 
the stack. If the compiler knows that it does not call subroutines that leave r2 in an altered state 
when they return, then no frame pointer is necessary. 

The stack must be kept aligned on 16-byte boundaries to keep data arrays aligned. Each subroutine 
must use stack space in multiples of 16 bytes. The frame pointer r3 (called fp in assembly 
language) need not point to a 16-byte boundary, as long as the compiler keeps data correctly 
aligned when assigning positions relative to rS. 

Figure 8-2 shows the stack-frame format. A fixed format is necessary to allow some minimal 
stack-frame analysis by a low-level debugger. 
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1 

Direction 

of 
Expansion 


31 







1 
-^ old 


' 




RETURN POINTER 


sp 


OLD FRAME POINTER 




fp 




SPECIFIC 
DYNAMIC 
STORAGE 








SP-STACK POINTER 
FP-FRAME POINTER 


sp 



Figure 8-2. Stack Frame Format 



8.3.1 Stack Entry and Exit Code 

Example 8-2 shows the recommended entry and exit code sequences. The stack pointer is restored 
to the value it had on entry into the subroutine. Assuming the subroutine needs to call another 
subroutine, it must save the frame pointer and its return address. It probably also needs to save 
some of its internal values across that call to another subroutine; therefore, the example saves one 
local register into the stack frame and subsequendy reloads it. 

Languages such as Pascal that need to maintain activation records on the stack can put them 
below the frame pointer in the program-specific area. The frame pointer is optional. All stack 
references can be made relative tor2. The code example in Example 8-3 shows the recommended 
entry and exit sequences when no frame pointer is required. 

A lowest-level subroutine need not perform any stack accesses if it can run completely from the 
temporary registers. No entry/exit code is required by a lowest-level subroutine. 
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// Subroutine entry 

adds -(Locals+8), sp, 



sp 



st.l 


fp> 


Locals(sp) 


adds 


Locals , 


sp, fp 


st.l 


rl, 


4(fp) 


st.l 
lubrou 


r5, 
tine exit 


-4(fp) 


Id.l 


-4(fp) 


, r5 


mov 


fp, 


sp 


Id.l 


4(fp), 


rl 


Id.l 


0(fp), 


fp 


bri 


rl 




adds 


8, 


sp , sp 



// Allocate stack space for local variables 

// Locals+8 must be a multiple of 16 

// Save old frame pointer below old SP 

// Set new frame pointer 

// Save return address 

// Save a local register 



// Restore a local register 

// Deallocate stack frame 

// Restore return address 

// Restore old frame pointer 

// Return to caller after next instruction 

// Deallocate frame pointer save area 



Example 8-2. Subroutine Entry and Exit with Frame Pointer 



// Subroutine entry 

addu -Locals, 



r2 , r2 // Allocate stack space for local variables 
// -Locals must be a multiple of 16 



// Subroutine exit 
bri rl 
addu Locals , 



// Return to caller after next instruction 
r2, r2 // Restore stack pointer 



Example 8-3. Subroutine Entry and Exit without Frame Pointer 



8.3.2 Dynamic Memory Allocation on the Stack 

Consider a function alloca which allocates space on the stack and returns a pointer to the space. 
The allocated space is lost when the caller returns. The function alloca could be implemented as 
shown in Example 8-4, and a separate stack pointer and frame pointer are required. 



// rl6 has size requested 

rl6 // Round size to mod 16 

rl6 // 

sp // Adjust stack downwards 

// Return to caller after next instruction 
// Set return value to allocated space 



Example 8-4. Possible Implementation of alloca 



oca: : 






adds 


15, 


rl6. 


andnot 


15, 


rl6. 


subs 


sp. 


rl6. 


bri 


rl 




mov 


sp, 


rl6 
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8.4 MEMORY ORGANIZATION 

Figure 8-3 suggests an overall memory layout. The i860 Linker needs to know by default where 
to assign code and data inside a program. The output of the linker must normally be executable 
without fixups. Code and data of both the application and operating system can share a single 
four-gigabyte address space. The example memory map assumes paging is being used to place 
DRAM-resident code in the upper 256 Mbytes of the address space. 







OxFFFFFFFF 

OxF0400000 
OxFOOOOOOO 




OPERATING SYSTEM CODE AREA 


EMPTY 


USER CODE AREA 


FIXED SUBROUTINE ENTRIES 


OPERATING SYSTEM DATA 


SPECIAL SHARED MEMORY AREA 
BETWEEN DIFFERENT TASKS 


USER STACK SPACE 


EMPTY 


0x00001000 
0x00000000 


USER DYNAMIC HEAP 


USER DATA 


OPERATING SYSTEM DATA AREA 







Figure 8-3. Example Memory Layout 

The first four Kbytes (first page) of the address space are reserved for the operating system. It 
should be a supervisor-only page and should not be swappable. Uninitialized external address 
references in user programs (which are equivalent to an assembly-language address expression of 
the form 0(rO)) reference this first page and cause a trap. 
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The data space for the application begins at 0x1000 (second page). It is all readable and writable. 
The total data address space available to the application should be over 3500 Mbytes. The user's 
data space has the following sections: 

• A user-data portion whose size and content is defined by the program and development tools. 

• A section called the heap whose size is determined at run time and can change as the program 
executes. 

• A stack section. 

The application's stack area starts at some address set by the OS and grows downward. The 
starting address of the stack would normally be at a four-Mbyte boundary to allow easy page- 
table formatting. The stack's starting address is not known in advance. It depends on how much 
address space is used by the operating system at the top of the address space. 

The operating system may also want to reserve some portion of the application's address space 
for shared memory areas with other tasks. UNIX System V allows such shared memory areas. 
The empty areas on the diagram if Figure 8-3 would normally be marked as not-present in the 
page table entries. Some special flag in the page table entry could allow the operating system to 
determine that the page is not usable instead of just not present in memory. 

A four-Mbyte area of code space is reserved starting at OxFOOOOOOO for a set of entry addresses 
to subroutines commonly used by all application programs (math libraries and vector primitives, 
for example). These code sections are shared by all application programs. The code in this area is 
directly callable from user-level code and executes at user level. Standard i860 Microprocessor 
calling conventions are used for these subroutines. The size of this area is chosen as four Mbytes, 
because that size corresponds to a directory-level page table entry that all applications tasks can 
share. It should be large enough to contain all desirable shared code. 

The application program code area starts at 0xF0400000. It can be as large as 248 Mbytes. The 
application code is write-protected. The operating system and application code spaces lie in the 
upper 256 Mbytes of the address space. The operating system code is in the upper part of the 256 
Mbyte code space. The operating system code is protected from application programs. Because it 
is easier for the operating system to divide up the address space in four- Mbyte blocks, the 
minimum operating- system code allocation from the address space is probably four Mbytes. 
Additional space would be allocated in four-Mbyte increments. 

Every code section should begin with a nop instruction so that the trap handler can always 
examine the instruction at fir — 4 even in case a trap occurs on the first instruction of a section. 

The memory-mapped VO devices should also be placed in the upper operating -system data space. 
The paging hardware allows logical addresses to be different from their corresponding physical 
addresses. The I/O device logical address area may be located anywhere convenient. 
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9.1 SMALL INTEGERS 

The 32-bit arithmetic instructions can be used to implement arithmetic on 8- or 16-bit ordinals 
and integers. The integer load instruction places 8- or 16-bit values in the low-order end of a 32- 
bit register and propagates the sign bit through the high-order bits of the register. 

Occasionally, it is necessary to sign extend 8- or 16-bit integers that are generated internally, not 
loaded from memory. Example 9-1 shows how. 



// SIGN-EXTEND 8 -BIT INTEGER TO 32 BITS 
// Assume the operand is already in rl6 

shl 24, rl6, rl6 // left- justify 

shra 24, rl6, rl6 // right-justify all but sign bit 

Example 9-1. Sign Extension 



Example 9-2 shows how to load a small unsigned integer, converting the sign-extended form 
created by the load instruction to a zero-extended form. 



// LOADING OF 8 -BIT UNSIGNED INTEGERS 
// Assume the address is already in rl9 

// Load the operand (sign- extended) into r20 
Id.b 0(rl9), r20 

// Mask out the high-order bits 
and OxOOOOOOFF, r20, r20 

Example 9-2. Loading Small Unsigned Integers 



9.2 SINGLE-PRECISION DIVIDE 

Example 9-3 computes Z = X ^ Y for single-precision variables. The algorithm begins by using 
the reciprocal instruction frcp to obtain an initial guess for the value of 1/Y. The frcp instruction 
gives a result that can differ from the true value of 1/Y by as much as 2~^. The algorithm then 
continues to make guesses based on the prior guess, refining each guess until the desired accuracy 
is achieved. Let G represent a guess, and let E represent the error, i.e. the difference between G 
and the true value of 1/Y. For each guess ... 

Gnew = Gold(2 - Goid*Y). 
^new ~ 2(Eolcl) • 
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This algorithm is optimized for high performance and does not produce results that are rounded 
according to the IEEE standard. Worst case error is about two least-significant bits. If the result 
is referenced by the next instruction, 22 clocks are required to perform the divide. 



// SINGLE- PRECISION DIVIDE 



// 
// 
// 
// 



The dividend X is in f6 

The divisor Y is in f2 

The result Z is left in f3 

f5 contains single-precision floating-point 2 



frcp.ss f2, 
fmul.ss f 2 , 
fsub.ss f 5 , 
fmul.ss f 3 , 
fmul.ss f 2 , 
fsub.ss f 5 , 
f mul . ss f 6 , 
f mul . s s f 4 , 



f3 // first guess has 2**- 8 error 

f 3 , f4 // guess * divisor 

f4, f4 // 2 - guess * divisor 

f 4 , f3 // second guess has 2**-15 error 

f 3 , f4 // avoid using f3 as srcl 

f4, f4 // 2 - guess * divisor 

f 3 , f5 // second guess * dividend 

f 5 , f3 // result = second guess * dividend 

Example 9-3. Single-Precision Divide 



9.3 DOUBLE-PRECISION DIVIDE 

Example 9-4 computes Z = X -^ Y for double-precision variables. The algorithm is similar to 
that shown previously for single-precision divide. For double-precision divide, one more iteration 
is needed to achieve the required accuracy. 

This algorithm is optimized for high performance and does not produce results that are rounded 
according to the IEEE standard. Worst case error is about two least-significant bits. If the result 
is referenced by the next instruction, 38 clocks are required to perform the divide. 



// DOUBLE- PRECISION DIVIDE 



// 


The divii 


den 


d X : 


Ls in f2 




// 


The divisor 


Y is in f4 




// 


The resu 


It 


Z is 


left in 


fS 




frcp.dd f4, 




f6 




// 




fmul.dd f4, 




f6, 


fS 


// , 




fld.d flttwo, 


flO 




// 


// 


The fld.d is 


fr 


ee. 


It complete 




fsub.dd flO, 




f8, 


f8 


// 




fmul.dd f6, 




f8, 


f6 


// 




fmul.dd f4, 




f6, 


fS 


// • 




fsub.dd flO, 




fS, 


fS 


// 




fmul.dd f6, 




fS, 


f6 


// 




fmul.dd f4, 




f6, 


fS 


// 




fsub.dd flO, 




fS, 


fS 


// 




fmul.dd f6, 




f2, 


f6 


// 




fmul.dd f8, 




f6, 


fS 


// 



first guess has 2**- 8 error 

guess * divisor 

load double-precision floating 2 

ly overlaps the preceding fmul.dd 

2 - guess * divisor 

second guess has 2**- 15 error 

avoid using f6 as srcl 

2 - guess * divisor 

third guess has 2**-29 error 

avoid using f6 as srcl 

2 - guess * divisor 

guess * dividend 

result = third guess * dividend 



Example 9-4. Double-Precision Divide 
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9.4 INTEGER MULTIPLY 

A 32-bit integer multiply is implemented in Example 9-5 by transferring the operands to floating- 
point registers and using thefmlow instruction. If the result is referenced in the next instruction, 
nine clocks are required. Five clocks can be overlapped with other operations. 



// INTEGER MULTIPLY 

// The multiplier is in r4 

// The multiplicand is in r5 

// The product is left in r6 

// The registers f 2 , f 4 , and f6 are used as temporaries. 

ixfr r4, f2 

ixf r r5 , f4 
// Two core instructions can be inserted here without penalty. 

fmlow.dd f4, f 2 , f6 
// Two core instructions can be inserted here without penalty. 

f xf r f 6 , r6 
// One core instruction can be inserted here without penalty. 

Example 9-5. Integer Multiply 



9.5 CONVERSION FROM SIGNED INTEGER TO DOUBLE 

The strategy used in Example 9-6 is to use the bits of the integer to construct a value in double- 
precision format. The double-precision value constructed contains two biases: 

BC A bias that compensates for the fact that the signed integer is stored in two's 

complement format. The value of this bias is 2^'. 

BN A bias that produces a normalized number, so that the algorithm does not cause a 

floating-point exception. The value of this bias is 2^^ 

If the desired value is x, then the constructed value is x + BC + BN. By later subtracting BC + 
BN, the value x is left in double precision format, properly normalized by the i860 Microprocessor. 
The value of BC + BN is 2^2-1-231 (0x4330_0000_8000_0000). 



// CONVERT SIGNED INTEGER TO DOUBLE 

// The integer is in r4 

// The double-precision floating-point result is left in f 7 : f 6 

// The register f 5 : f 4 contains BN+BC 

xorh 0x8000, r4, r4 // Complement sign bit (equivalent to adding BC) 

ixfr r4, f6 // Construct low half. 

fmov.ss f 5 , f7 // Set exponent in high half (includes BN) 
// One instruction can be inserted here without penalty. 

fsub.dd f6, f4, f6 // (x + BN -I- BC) - (BN + BC) = x 
// Two core instructions can be inserted here without penalty. 

Example 9-6. Single to Double Conversion 
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The conversion requires 7 clocks if the result is referenced in the next instruction. Three clocks 
can be overlapped with other operations. 

9.6 SIGNED INTEGER DIVIDE 

Example 9-7 combines the techniques of Section 9.3 and 9.5. It requires 62 clocks (59 clocks 
without remainder). 



// SIGNED INTEGER DIVIDE 

// The denominator is in r4 

// The numerator is in r5 

// The quotient is left in r6 

// The remainder is left in r7 

// The registers f2 through fll are used as temporaries, 



// Convert Denominator and Numerator 



fld.d 
xorh 
ixfr 
f mov . s s 
xorh 
fsub.dd f4, 
ixfr r5 , 
fmov.ss f 7 , 
fsub.dd f2, 



two 5 2 two 31 

0x8000, r4 

r4, 

f7, 

0x8000 



f4 

f5 

r5, 

f6, 

f2 

f3 

f6, 



f6 


// 


r4 


// 




// 




// 


r5 


// 


f4 


// 




// 




// 


f2 


// 



load constant 2**52 + 2**31 



// Do Floating-Point Divide 



fld.d fdtwo, 
f rep . dd f 4 , 
fmul.dd f4, 
fsub.dd flO, 
fmul.dd f6, 
fmul.dd f4, 
fsub.dd flO, 
fmul.dd f6, 
fmul.dd f4, 
fsub.dd flO, 
fmul.dd f6, 
fmul.dd f8, 



flO 

f6 

f6. 

f8, 

f8, 

f6. 

f8. 

f8, 

f6, 

f8, 

f2, 

f6, 



// load floating-point two 

// first guess has 2**-8 error 

f8 // guess * divisor 

fS // 2 - guess * divisor 

f6 // second guess has 2**- 15 error 

f8 // avoid using f6 as srcl 

f8 // 2 - guess * divisor 

f6 // third guess has 2**-29 error 

f8 // avoid using f6 as srcl 

f8 // 2 - guess * divisor 

f6 // guess * dividend 

f8 // result = third guess * dividend 



// Convert Quotient to Integer 

fld.d onepluseps, flO // load value 1 + 2**-40 

fmul.dd f 8 , flO, f8 // force quotient to be bigger than integer 

ixfr r4, flO // get denominator for remainder computation 

f trunc . dd f 8 , f 8 // convert to integer 



// Compute Remainder 



fmlow. 


.dd flO, 


f8 


fxfr 


flO, 


r4 


fxfr 


f8, 


r6 


subs 


r5, 


r7 



flO // quotient * denominator 

// transfer quotient 
r7 // remainder = numerator - quotient * denominator 

Example 9-7. Signed Integer Divide 
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9.7 STRING COPY 

Example 9-8 shows how to avoid the freeze condition that might occur when using a load in a 
tight loop such as that commonly used for copying strings. A performance penalty is incurred if 
the destination of a load is referenced in the next instruction. In order to avoid this condition, 
Example 9-8 juggles characters of the string between two registers. 



// STRING COPY 

// Assumptions: 

// Source address alignment unknown 

// Destination address alignment unknown 

// End of string indicated by NUL 

// rl7 - address of source string 

// rl6 - address of destination string 



copy_string: : 



Id.b 

bte 

adds 

Id.b 

subs 



loop: 



done : 



st.b 
adds 
or 

bnc . t 
Id.b 

bri 
st.b 



0(rl7), 

0, r26, 

1, rl7. 
0(rl7), 
rl7, rl6. 



r26, 

1, 

rO, 



rl6, 
r27, 



rl8(rl6), 

rl 
r26, 



r26 // Load one character 

done // Test for NUL character 

rl7 // Bump pointer to source string 

r27 // Load one more character 

rl8 // Use constant offset to avoid 
// incrementing two indexes 

0(rl6) // Store previous character 

rl6 // Bump common index 

r26 // Test for NUL character 

loop // If not NUL, branch after loading 

r27 // next character. rl8(rl6) = 0(rl7) 



// Return after storing 
// the NUL character, too 



0(rl6) 

Example 9-8. String Copy 



9.8 FLOATING-POINT PIPELINE 

Most instruction sequences that use pipelined instructions can be divided into three phases: 



Priming 



Filling a pipeline with known intermediate results while 
disposing of previous pipeline contents. 



Continuous Operation 



Receiving expected results with the initiation of each new 
pipelined instruction. 



Flushing 



Retrieving the results that remain in the pipeline after the 
pipelined instruction sequence has terminated. 



Example 9-9 shows one strategy for using the floating-point adder, which has a three-stage 
pipeline. This example assumes that the prior contents of the adder's pipeline are unimportant, 
and discards them by specifying register fO as the destination of the first three instructions. After 
performing the intended calculations, it flushes the pipeline by executing three dummy addition 
instructions withfO (which always contains zero) as the operands. 
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// PIPELINED FLOATING-POINT ADD 



// 


Calculates 


flO = 


= £4 


+ 


f5, 


fll = f6 


+ 


£7 






// 






fl2 = 


= f8 


+ 


f9, 


fl3 = f5 


+ 


£6 






// 


Assume 


f4 = 


1.0 




f5 


= 2 


0, f6 = 


3 


.0 






// 




f7 = 


4.0 




f8 


= 5 


0, f9 = 


6 


.0 






// 














Stage 1 




Stage 2 


Stage 3 


Result 


// 


Priming 


phase 






















pfadd. 


ss f4, 


t"!5, 


to 




// 


1+2 




?? 


?? 


Discard 




pfadd. 


ss f6, 


f7, 


fO 




// 


3+4 




i+2 


?? 


Discard 




pfadd. 


ss f8, 


f9, 


fO 




// 


5+6 




3+4 


3 


Discard 



// Continuous operation phase 
pfadd. ss f5, f6, flO // 



2+3 



5+6 



flO= 3 



// For longer pipelined sequences, include more instructions here 

// Flushing phase 

pfadd. ss fO, fO, fll 

pfadd. ss fO, fO, fl2 

pfadd. ss fO, fO, fl3 

Example 9-9. Pipelined Add 



// 


0+0 


2+3 


11 


fll= 7 


// 


0+0 


0+0 


5 


fl2=ll 


// 


0+0 


0+0 





fl3= 5 



9.9 PIPELINING OF DUAL-OPERATION INSTRUCTIONS 

When using dual-operation instructions (all of which are pipelined), code that primes and flushes 
the pipelines must take into account both the adder and multiplier pipelines. Example 9-10 
illustrates pipeline usage for a simple single-precision matrix operation: the dot product of a 1X8 
row matrix A with an 8x 1 column matrix B. For the purpose of tracking values through the 
pipelines, assume that the actual matrices to be multiplied have the following values: 



A = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0] 



B = 



8.0 
7.0 
6.0 
5.0 
4.0 
3.0 
2.0 
1.0 



Assume further that the two matrices are already loaded into registers thus: 



A: 



1.0 
2.0 
3.0 
4.0 



f4 = 

f 5 = 

f6 

f7 

f8 = 5.0 

f9= 6.0 

flO = 7.0 

fll = 8.0 



B: 



fl2 = 8.0 

fl3 = 7.0 

fl4 = 6.0 

fl5 = 5.0 

fl6 = 4.0 

fl7 = 3.0 

fl8 = 2.0 

fl9= 1.0 
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The calculation to perform is 1.0*8.0 + 2.0*7.0+ ... 8.0*1.0 — a series of multiplications 
followed by additions. The dual-operation instructions are designed precisely to execute this type 
of calculation efficiently by using the adder and multiplier in parallel. At the heart of example 
9-10 is the dual-operation instruction m12apm, which multiplies its operands and adds the 
multiplier result to the result of the adder. 

The priming phase is somewhat different in Example 9-10 than in Example 9-9. Because the 
result of the adder is fed back into the adder, it is not possible to simply ignore the prior contents 
of the adder pipeline; and because the result of the multiplier is automatically fed into the adder, 
it is important to consider the effect of the multiplier on the adder pipeline as well. This example 
waits until unknown results have been flushed from the multiplier pipeline, then uses pfadd 
instructions to put zeros in all stages of the adder pipeline. 

9.10 DUAL INSTRUCTION MODE 

The previous Example 9-9 and Example 9-10 showed how the i860 Microprocessor can deliver 
up to two floating-point results per clock by using the pipelining and parallelism of the adder and 
multiplier units. These examples, however are not realistic, because they assume that the data is 



// 


PIPELINED DUAL' 


-OPERATION INSTRUCTION 










// 












Multiplier 




Adder 






// 












S 


tages 






Stages 






// 












1 


2 


3 


1 


2 


3 


Result 


// 


Priming phase 
























ml2apm. ss 


f4, 


fl2, 


,fO 


// 


1*8 


?? 


?? 


?? 


?? 


?? 


Discard 




ml2apm.ss 


f5, 


fl3, 


,fO 


// 


2*7 


i*8 


?? 


?? 


?? 


?? 


Discard 




ml2apra. ss 


f6, 


fl4^ 


,fO 


// 


3*6 


2*7 


8* 


?? 


?? 


?? 


Discard 




pfadd. ss 


fO, 


fO , 


,fO 


// 











?? 


?? 


Discard 




pfadd. ss 


fO, 


fO 


,fO 


// 











'6 


?? 


Discard 




pfadd. ss 


fO, 


fO 


,fO 


// 














6 


Discard 


// 


Continuous 


operation phase 


















ml2apm. ss 


f7, 


fl5 


,fO 


// 


4*5 


3*6 


14 


8+0 


0+0 





Disfcard 




nil2apm.ss 


f8, 


fI6 


,fO 


// 


5*4 


4*5 


18 


14+0 


8+0 





Discard 




nil2apra. ss 


f9, 


fl7 


,fO 


// 


6*3 


5*4 


20 


18+0 


14+0 


8 


Discard 




ml2apm.ss 


flO 


,fl8 


,fO 


// 


7*2 


6*3 


20 


20+8 


18+0 


14 


Discard 




ml2apm.ss 


fll 


,fl9 


,fO 


// 


8*1 


7*2 


18 


20+14 


20+8 


18 


Discard 


// 


For larger 


matrices, include more instructions 


1 here 






// 


Flushing phase 
























ml2apra.ss 


fO, 


fO, 


fO 


// 


0*0 


8*1 


14 


18+18 


20+14 


28 


Discard 




nil2apm. ss 


fO, 


fO, 


fO 


// 


0*0 


0*0 


8 


14+28 


18+18 


34 


Discard 




inl2apm. ss 


fO, 


fO, 


fO 


// 


0*0 


0*0 





8+34 


14+28 


36 


Discard 




pfadd. ss 


fO, 


fO, 


f20 


// 








0+0 


8+34 


42 


f20=36 




pfadd. ss 


f20 


,f21 


,f21 


// 








42+36 


0+0 


42 


f 2 1=42 




pfadd. ss 


fO, 


fo, 


f20 


// 








0+0 


42+36 





f20=42 




pfadd. ss 


fO, 


fO, 


fO 


// 








0+0 


0+0 


78 


Discard 




pfadd. ss 


fO, 


fO, 


f21 


// 








0+0 


0+0 





f21=78 




fadd.ss 


f20 


,f21 


,f20 


// 














f20=120 








Example 9-10 


1. Pipelined Dual-Operation Instruction 



9-7 



irrteT 



PROGRAMMING EXAMPLES 



already loaded in registers. Example 9-11 goes one step further and shows how to maintain the 
high throughput of the floating-point unit while simultaneously loading the data from main 
memory and controlling the logical flow. 

The problem is to sum the single-precision elements of an arbitrarily long vector. The procedure 
uses dual-instruction mode to overlap loading, decision making, and branching with the basic 
pipelined floating-point add instruction pfadd.ss. To make obvious the pairing of core and 
floating-point instructions in dual-instruction mode, the listing in Example 9-11 shows the core 
instruction of a dual-mode pair indented with respect to the corresponding floating-point 
instruction. 

Elements are loaded two at a time into alternating pairs of registers: one time at loop1 into f20 
and f 21 , the next time at loop2 into f22 and f 23 . Performance would be slighty degraded if the 
destination of a fid.d were referenced as a source operand in the next two instructions. The 
strategy of altemating registers avoids this situation and maintains maximum performance. Some 
extra logic is needed atsumup to account for an odd number of elements. 

9.11 CACHE STRATEGIES FOR MATRIX DOT PRODUCT 

Calculations that use (and reuse) massive amounts of data may render significantly less than 
optimum performance unless their memory access demands are carefully taken into consideration 
during algorithm design. The prior Example 9-11 easily executes at near the theoretical maximum 
speed of the i860 Microprocessor because it does not make heavy demands on the memory 
subsystem. This section considers a more demanding calculation, the dot product of two matrices, 
and analyzes two memory access strategies as they apply to this calculation. 

The product of matrix A=A[j of dimension LXM with matrix B=B^ of dimension MXA^ is the 
matrix C=Cij of dimension LXA^, where ... 

Ci,j = /li,ifii,j+^i,252j+ . • • +^i,MSMJ (for 1 ^ / ^ L, 1 ^y ^ yV) 

The basic algorithm for calculation of a dot product appears in Example 9-10. To extend this 
algorithm to the current problem requires adding instructions to: 

1 . Load the entries of each matrix from memory at appropriate times. 

2. Repeat the inner loop as many times as necessary to span matrices of arbitrary M dimension. 

3. Repeat the entire algorithm L*N times to produce the Ly.N product matrix. 

Each of the examples 9-12 and 9-13 accomplishes the above extensions through straightforward 
programming techniques. Each example uses dual-instruction mode to perform the loading and 
loop control operations in parallel with the basic floating-point calculations. The examples differ 
in their approaches to memory access and cache usage. To eliminate needless complexity, the 
examples require that the M dimension be a multiple of eight and that the B matrix be stored in 
memory by column instead of by row. Data is fetched 32 bytes beyond the higher- address end of 
both matrices. In real applications, programmers should ensure that no page protection faults 
occur due to these accesses. 
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// SINGLE- PRECI 
// input : 

// 

// output : 
spvsum: : 
fld.d 



SIGN VECTOR SUM 
rl6 - vector address 
rl7 - vector size (must be > 5) 
fl6 - sum of vector elements 



d. pfadd. ss 
adds 

d. pfadd. ss 
bla 

d. pfadd. ss 
fld.d 
loopl : : 

d. pfadd. ss 
bla 

d. pfadd. ss 
fld.d 

// If we re 
// to be lo 
// f20, f21 
// Add f20 

d. pfadd. ss 
br 

d. pfadd. ss 
nop 
loop2: : 

d. pfadd. ss 
bla 

d. pfadd. ss 
fld.d 

// If we re 
// to be lo 
// £20, f21 
// Add f20 

d. pfadd. ss 
nop 

d. pfadd. ss 
nop 



rO(rl6), 
-2, 



fO, 
-6, 



fO, 
r21 
fO, 



f20 // Load first two elements 
r21 // Loop decrement for bla 
// Initiate entry into dual- instruction mode 
fO, fO // Clear adder pipe (1) 
rl7, rl7 // Decrement size by 6 
// Enter into dual -instruction mode 
fO, fO // Clear adder pipe (2) 
rl7, loopl // Initialize LCC 
fO, fO // Clear adder pipe (3) 

f22 // Load 3rd and 4th elements 



8(rl6)++, 

f20, f30, 
r21, rl7, 
f21, f31, 
8(rl6)++, 
ach this point, 
aded. rl7 is e 

f22, and £23 
and f22 to the 
f20, f30. 



sumup 
f21, 



f31. 



f30 // Add £20 to pipeline 
loop2 // If more, go to loop2 after 

f31 // adding f21 to pipeline and 

f20 // loading next f20:f21 
at least one element remains 
ither -4 or -3. 

still contain vector elements, 
pipeline, too. 

f30 

// Exit loop after adding 

f31 // f21 to the pipeline 



f22, f30, f30 // Add f22 to pipeline 
r21, rl7 , loopl // If more, go to loopl after 
f23, f31, f31 // adding f23 to pipeline and 
8(rl6)++, £22 // loading next f22:f23 

ach this point, at least one element remains 

aded. rl7 is either -4 or -3. 

£22, and £23 still contain vector elements. 

and £21 to the pipeline, too. 
£20, f30, f30 

f21, f31, f31 



sumup : : 

pfadd. ss f22 

mov - 4 , 

pfadd. ss f23 

bte r21 

fld.l 8(r 

pfadd. ss f20, 
// Intermediate 
// Let A1:A2:A3 

done : : 

pfadd. ss £0, 

pfadd. ss f30, 

pfadd. ss fO, 

pfadd. ss £0, 

pfadd. ss f , 

fadd.ss f30, 



// Initiate exit from dual mode 
f30, f30 // Still in dual mode 
r21 
, f31, f31 // Last dual-mode pair 
, rl7, done // If there is one more 
16)++, f20 // element, load it and 
f30, f30 // add to pipeline 
results are sitting in the adder pipeline, 
represent the current pipeline contents 



fO, f30 // 0:A1:A2 f30=A3 

f31, f31 // A2+A3:0:A1 f31=A2 

fO, f30 // 0:A2+A3:0 F30=A1 

£0, £0 // 0:0:A2+A3 

£0, £31 // 0:0:0 F31=A2+A3 

f31, fl6 // fl6 = A1+A2+A3 



Example 9-11. Dual-Instruction Mode 
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• Example 9-12 depends solely on cached loads. 

• Example 9-13 depends on a mix of cached and pipelined loads. 

Example 9-12 uses the fid instruction for all loads, which places all elements of both matrices A 
and B in the cache. This approach is ideal for small matrices. Accesses to all elements (after the 
first access to each) retrieve elements from the cache at the rate of one per clock. Using fid.q 
instructions to retrieve four elements at a time, it is possible to overlap all data access as well as 
loop control with m12apm instructions in the inner loop. 

Note, however, that Example 9-12 is "cache bound"; i.e., if the combined size of the two 
matrices is greater than that of the cache, cache misses will occur, degrading performance. The 
larger the matrices, the more the misses that will occur. 



// MATRIX MULTIPLY, C = A * B, CACHED LOADS ONLY 

// Registers loaded by calling routine 

// rl6 - pointer into A, stored in memory by rows 

// rl7 - pointer into B, stored in memory by columns 

// rl8 - pointer into C, stored in memory by rows 

// rl9 - L, the number of rows in A 

// r20 - M, the niamber of columns in A and rows in B 

// r21 - N, the number of columns in B 

// Registers used locally 

// r28 - row/column counter decremented by bla for loop control 

// r27 - decrementor for row/column pointers 

// r26 - counter of rows in A 

// r25 - counter of columns in B 

// r24 - temporary pointer into B 

// r23 - number of bytes in row of A or column of B 

// f4. . f 11 - matrix A row values 

// fl2..fl9 - matrix B column values 

// f20..f22 - temporary results 

shl 2,r20,r23 // Number of bytes in M entries 

adds -8,rO,r27 // Set decrementor for bla 

adds -8,r20,r28 // Initialize row/column counter 

adds -4,rl8,rl8 // Start C index one entry low 

d.fiadd.dd fO.fO.fO // Initiate dual- instruction mode 

adds -I,rl9,r26 // Make row counter zero relative 

d.fnop // First dual-mode pair 

bla r27,r28,start_row // Initialize LCC 

d . f nop // 

subs rl6,r23,rl6 // Start pointer to A one row low 

start row: : // Executed once per row of A 

d.pTmul.ss fO.fO.fO // 

mov rl7,r24 // Point to first col of B 

d.pfmul.ss fO.fO.fO // 

adds r23,rl6,rl6 // Point to next row of A 

d.pfmul.ss fO,fO,fO // 

fld.q 16(r24),fl6 // Load 4 entries of B 

d.pfadd.ss f , f , f // 

fld.q 16(rl6),f8 // Load 4 entries of A 

d.pfadd.ss f , f , f // 

adds -I,r21,r25 // Initialize column counter 

d.pfadd.ss fO,fO,fO // 

fld.q 0(rl6),f4 // Load 4 entries of A 

Example 9-12. Matrix Multiply, Cached Loads Only (sheet 1 of 2) 
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inner loop:: // Process eight entries of row of A with eight of col of B 
d.mT2apni.ss f 8 , fl6,f20 // 

// Load 4 entries of B 



fld.q 
d.ml2apra. ss 

adds 
d.ml2apni. ss 

adds 
d.ml2apm. ss 

fld.q 
d.inl2apm. ss 

fld.q 
d.ml2apm. ss 

nop 
d.ml2apm. ss 



0(r24),fl2 // 
f9, fl7,f20 // 

32,rl6,rl6 // Bump pointer to A by 8 entries 
flO,fl8,f20 // 

32,r24,r24 // Biimp pointer to B by 8 entries 
fll,fl9,f20 // 

16(r24),fl6 // Load 4 entries of B 
f4, fl2,f20 // 

16(rl6),f8 // Load 4 entries of A 
f5, fl3,f20 // 
// 
// 
// Loop until end of row/column 

// 

// Load 4 entries of A 



f6, fl4,f20 
bla r27,r28, inner_loop 
d.ml2apm.ss f 7 , fl5,f21 
fld.q 0(rl6),f4 
// End Inner Loop. End of row/column 

d.ml2apra.ss fO, fO, f22 // 

subs rl6,r23,rl6 // Set A pointer back to beginning of row 

d.ml2apm.ss fO, fO, f20 // 

adds -8,r20,r28 // Reinitialize row/column counter 

d.ml2apm.ss f , fO, f21 // 

nop // 

d.pfadd.ss fO, fO, f22 // 

bla r27,r28, inner_loop // Won't branch; initializes LCC 

d.pfadd.ss f , f , f20 // 

16(rl6), fS // Load 4 entries of A 

fO, fO, f21 // 

16(r24), fl6 // Load 4 entries of B 

f20,f22,f22 // 

0(rl6), f4 // Load 4 entries of A 

f21,f22,f22 // 

-I,r25,r25 // Decrement column counter 

fO, fO, fO // 

f22, 4(rl8)++ // Store row/column product in C 

// Continue with next column of B? 

d.pfadd.ss f , f , fO // 

bnc . t inner loop // CC controlled by prior adds 

d.pfadd.ss fO, f^, fO // 

nop // 
// Continue with next row of A? 

// 

// Is row counter zero? 

// 

// Taken if row counter not zero 

// 

// Decrement row counter 

// Initiate exit from dual mode 

// 

// Last dual -mode pair 

// End 

Example 9-12. Matrix Multiply, Cached Loads Only (sheet 2 of 2) 



fld.q 
d.pfadd. ss 

fld.q 
d. fadd. ss 

fld.q 
d. fadd. ss 

adds 
d.pfadd. ss 

fst.l 



d . f nop 




xor 


r26,rO,rO 


d . f nop 




bnc. t 


start row 


d . f nop 




adds 


-I,r26,r26 


fnop 




nop 




fnop 




nop 
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// MATRIX MULTIPLY, C = A * B, CACHED AND PIPELINED LOADS MIXED 

// Registers loaded by calling routine 

// rl6 - pointer into A, stored in memory by rows 

// rl7 - pointer into B, stored in memory by columns 

// rl8 - pointer into C, stored in memory by rows 

// rl9 - L, the number of rows in A 

// r20 - M, the number of columns in A and rows in B 

// r21 - N, the number of columns in B 

// Registers used locally 

// r29 - temporary pointer into A 

// r28 - row/column counter decremented by bla for loop control 

// r27 - decrementor for row/column pointers 

// r26 - counter of rows in A 

// r25 - counter of columns in B 

// r24 - temporary pointer into B 

// r23 - number of bytes in row of A or column of B 

// f4..fll - matrix A row values 

// fl2..fl9 - matrix B column values 

// f20..f22 - temporary results 



mov 

shl 

adds 

adds 

d.fiadd.dd 

adds 

d . f nop 

adds 
d . f nop 

bla r27 
d. fnop 

mov 
start row: : 
d . pTmul . s s 

pfld.d 
d . p f mul . s s 

pfld.d 
d.pfmul . ss 

pfld.d 
d.pfadd. ss 

fld.q 
d.pfadd. ss 

pfld.d 
d.pfadd. ss 

adds 
d . fnop 

pfld.d 

inner_loop : : // 

d.ml2apra. ss 

fld.q 
d.ml2apm. ss 

pfld.d 
d.ml2apm. ss 

pfld.d 



rl7,r24 // Pointer to B 

2,r20,r23 // Number of bytes in M entries 

-8,rO,r27 // Set decrementor for bla 

-8,r20,r28 // Initialize row/colvimn counter 

fO,fO,fO //Initiate dual- instruction mode 

-4,rl8,rl8 // Start C index one entry low 
// First dual -mode pair 

-I,rl9,r26 // Make row counter zero relative 

// 
r28 , start_row // Initialize LCC 

// 

rl6,r29 // Pointer to A 

// Executed once per row of A 

f , f , f // 

0(r24),f0 // Load 2 entries of B into load pipe 

f , f , f // 

8(r24)++,f0 // Load 2 entries of B into load pipe 

f , f , f // 

8(r24)++,f0 // Load 2 entries of B into load pipe 

f , f , f // 

0(r29),f4 // Load 4 entries of A 

f , f , f // 

8(r24)++,fl2 // Load 2 entries of B 

fO.fO.fO // 

-I,r21,r25 // Initialize column counter 

// 

8(r24)++,fl4 // Load 2 entries of B 
Process eight entries from row of A with eight from col of B 
f4, fl2,fO // 

16(r29)++,f8 // Load 4 entries of A 
f5, fl3,fO // 

8(r24)++,fl6 // Load 2 entries of B 
f6, fl4,f0 // ' 

8(r24)++,fl8 // Load 2 entries of B 



Example 9-13. Matrix Multiply, Cached and Pipelined Loads (sheet 1 of 2) 
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d.ml2apm.ss f 7 , fl5,fO // 

fld.q 16(r29)++,f4 // Load 4 entries of A 

d.ml2apni.ss f 8 , fl6,fO // 

nop // 

d.ml2apni.ss f 9 , fl7,fO // 

pfld.d 8(r24)++,fl2 // Load 2 entries of B 

d.ml2apni.ss flO.flS.fO // 

bla r27 ,r28 , inner_loop // Loop until end of row/column 

d.inl2apm.ss fll,fl9,fO // 

pfld.d 8(r24)++,fl4 // Load 2 entries of B 

// End Inner Loop. End of row/column 

d.ml2apni.ss fO, fO, fO // 

nop // 

d.ml2apm.ss f , f , fO // 

adds -8,r20,r28 // Reinitialize row/column counter 

d.ml2apm.ss f , f , fO // 

mov rl6,r29 // Set A pointer back to beginning of row 

d.pfadd.ss fO, fO, f22 // 

fld.q 0(r29), f4 // Load first 4 entries of row of A 

d.pfadd.ss fO, fO, f20 // 

bla r27 , r28 , inner loop // Won't branch; initializes LCC 

d.pfadd.ss fO, fO, fll // 

nop // 

d.fadd.ss f20,f22,f22 // 

nop // 

d.fadd.ss f21,f22,f22 // 

adds -I,r25,r25 // Decrement column counter 

d.pfadd.ss f , fO, fO // 

fst.l f22, 4(rl8)++ // Store row/column product in C 

// Continue with next column of B? 

d.pfadd.ss fO, f , fO // 

bnc . t inner_loop // CC controlled by prior adds 

d.pfadd.ss fO, fO, fO // 

nop // 

// End of all columns of B 

d . f nop // 

mov rl7,r24 // Point to first col of B 

d . f nop // 

adds rl6,r23,rl6 // Bump pointer to A by one row 

d . f nop // 

mov rl6,r29 // Set A index to beginning of next row 
// Continue with next row of A? 



d. fnop 




// 


xor 


r26,rO,rO 


// Is row counter zero? 


d . fnop 




// 


bnc. t 


start row 


// Taken if row counter not zero 


d . fnop 




// 


adds 


-I,r26,r26 


// Decrement row counter 


fnop 




// Initiate exit from dual mode 


nop 




// 


fnop 




// Last dual -mode pair 


nop 




// End 



Example 9-13. Matrix Multiply, Cached and Pipelined Loads (sheet 2 of 2) 



9-13 



inteT 



PROGRAMMING MODEL 



Example 9-13 uses fid for all the elements of each row of A, and uses pfid to pass all columns of 
B against each row of A. This example is less cache bound, because only rows of A are placed 
in the cache. More load instructions are required, because a pfId can load at most two single- 
precision operands. Still, with pipelined memory cycles, it remains possible to overlap the loading 
of the eight items from matrix A, the eight items from matrix B, and the loop control with the 
eight m12apm instructions in the inner loop. 

The strategy of Example 9-13 is suitable for larger matrices than the strategy in Example 9-12 
because, even in the extreme case where only one row of A fits in the cache, cache misses occur 
only the first time each row is processed. However, if dimension M is so great that not even one 
row of A fits entirely in the cache, cache misses will still occur. On the other side, for small 
matrices. Example 9-13 may not perform as well as Example 9-12, because, even when there is 
sufficient space in the cache for elements of matrix B, Example 9-13 does not use it. 
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Appendix A 
Instruction Set Summary 



Key to abbreviations: 
srcl 



srcini 

src2 
rdest 

ireg 

ctiireg 

#const 

inem.x(address) 
■P 



.w 

.X 

•y 

.z 



A register (integer or floating-point depending on class of instuction) or a 
16-bit immediate constant or address offset. The immediate value is zero- 
extended for logical operations and is sign-extended for add and subtract 
operations (including addu and subu) and for all addressing calculations. 

Same as srcJ except that no immediate constant or address offset is 
permitted. 

A register (integer or floating-point depending on class of instruction). 

A register (integer or floating-point depending on class of instruction). 

A floating-point register. 

An integer register. 

One of the control registers fir, psr, epsr, dirbase, db, or fsr. 

A 16-bit immediate constant or address offset that the i860 Microprocessor 
sign-extends to 32 bits when computing the effective address. 

The contents of the memory location indicated by address with a size ofx. 

Precision specification. Unless otherwise specified, floating-point operations 
accept single- or double-precision source operands and produce a result of 
equal or greater precision. Both input operands must have the same 
precision. The source and result precision are specified by a two-letter 
suffix to the mnemonic of the operation, as shown in the table below. 



Suffix 


Source Precision 


Result Precision 


.ss 
.sd 
.dd 


single 
single 
double 


single 
double 
double 



.ss (32 bits), or.dd (64 bits) 
.b (8 bits), .s (16 bits), or. I (32 bits) 
.1 (32 bits), .d (64 bits), or.q (128 bits) 
.1 (32 bits), or.d (64 bits) 
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Ibrojf' A signed, 26-bit, immediate, relative brancli offset 

sbrojf' A signed, 16-bit, immediate, relative branch offset 

brx A function that computes the target address of a branch by shifting the 

offset (either Ibrojf or sbrojf) left by two bits, sign-extending it to 32 bits, 
and adding the result to the address of the current control-transfer instruction 
plus four. 

snis An integer register or a 5-bit immediate constant that is zero-extended to 

32 bits. 

compl A function that returns the two's complement of its argument. 

PM The pixel mask, which is considered as an array of eight bits PM[0] . .PM[7] , 

where PM[0] is the least-significant bit. 

Instruction Definitions in Alphabetical Order 

adds srcl , src2, rdest Add Signed 

rdest -^ — srcl + srcl 
OF A — (bit 31 carry ^ bit 30 carry) 
CC set \i srcl < compl (srcl) (signed) 
CC clear if srcl ^ compl (srcl) (signed) 

addu srcl , srcl, rdest Add Unsigned 

rdest A — srcl + srcl 
OF A — bit 3 1 carry 
CC A — bit 31 carry 

and srcl , srcl, rdest Logical AND 

rdest A — srcl and srcl 

CC set if result is zero, cleared otherwise 

andh #const, srcl, rdest Logical AND High 

rdest A — {# const shifted left 16 bits) and srcl 
CC set if result is zero, cleared otherwise 

andnot srcl , srcl, rdest Logical AND NOT 

rdest A — not srcl and srcl 

CC set if result is zero, cleared otherwise 

andnoth # const, srcl, rdest Logical AND NOT High 

rdest A — not {# const shifted left 16 bits) and srcl 
CC set if result is zero, cleared otherwise 

be Ibroff' Branch on CC 

IF CC = 1 

THEN continue execution at brx{lbroff) 

FI 
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bet 



IF 
THEN 

ELSE 
FI 



Ihroff- 

CC = 1 

execute one more sequential instruction 
continue execution at hrx(lbroff) 
skip next sequential instruction 



.Branch on CC, Taken 



bla src/ni, src2, shroff' 

LCC-temp clear \f src2 < coinp2 (src/ni) (signed) 
LCC-temp set \f src2 ^ comp2(srclni) (signed) 
src2 -^ — srclni + src2 
Execute one more sequential instruction 



Branch on LCC and Add 



IF 
THEN 

ELSE 
Fl 



LCC 

LCC <— LCC-temp 

continue execution at hrx(shroff') 

LCC ^4— LCC-temp 



bnc 



IF 

THEN 

Fl 



Ihroff' 

CC = 

continue execution at hrx(lbroff') 



Branch on Not CC 



bnc.t 



IF 
THEN 

ELSE 
FI 



Ihroff' 

CC = 

execute one more sequential instruction 
continue execution at hrx(lhroff') 
skip next sequential instruction 



Branch on Not CC, Taken 



br Ihroff 

Execute one more sequential instruction. 
Continue execution at brx( Ihroff') . 

bri [srcJni] 

Execute one more sequential instruction 

IF 

THEN 



Branch Direct Unconditionally 



any trap bit in psr is set 

copy PU to U, PIM to IM in psr 

clear trap bits 



Branch Indirect Unconditionally 



IF 
THEN 

ELSE 



DS is set and DIM is reset 

enter dual-instruction mode after executing one 

instruction in single-instruction mode 
IF DS is set and DIM is set 

THEN enter single-instruction mode after executing one 

instruction in dual-instruction mode 
ELSE IF DIM is set 

THEN enter dual-instruction mode 
for next two instructions 
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ELSE enter single-instruction mode 

for next two instructions 
FI 
FI 
FI 
FI 
Continue execution at address in srclni 

(The original contents of srclni is used even if the next instruction 
modifies srclni. Does not trap if srclni is misaligned.) 

bte srcls, src2, shroff' Branch If Equal 

IF srcls = src2 

THEN continue execution at brx(sbroff') 

FI 

btne srcls, src2, shroff Branch If Not Equal 

IF srcls # src2 

THEN continue execution at hrx(shroff) 

FI 

call Ihroff Subroutine Call 

rl -4 — address of next sequential instruction + 4 
Execute one more sequential instruction 
Continue execution at hrx(lhroff) 

call! [srclni] Indirect Subroutine Call 

rl 4 — address of next sequential instruction + 4 
Execute one more sequential instruction 
Continue execution at address in srclni 

(The original contents of srclni is used even if the next instruction 

modifies srclni. Does not trap \f srclni is misaligned.) 

fadd.p srcl , src2, rdest Floating-Point Add 

rdest "^ — srcl + src2 



faddp srcl , src2, rdest 

rdest -4 — srcl + src2 

Shift and load MERGE register as defined in Table A- 



Add with Pixel Merge 





Table A-1. FADDP MERGE Update 




Pixel 

Size 

(from PS) 


Field Loaded From 
Result into MERGE 


Right Shift 

Amount 
(Field Size) 


8 
16 
32 


63..56, , 47..40, 31 ..24, 15..8 
63..58, 47..42, 31. .26, 15..10 
63..56, 31. .24 


8 
6 
8 
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faddz srcl , src2, rdest Add with Z Merge 

rdest •4— srcl + srcl 

Shift MERGE right 16 and load fields 31. .16 and 63. .48 

fiadd.w srcl , srcl, rdest Long-Integer Add 

rdest -^ — srcl + srcl 

fisub.w srcl , srcl, rdest Long-Integer Subtract 

rdest A — srcl — srcl 

fix.p srcl , rdest Floating-Point to Integer Conversion 

rdest A — 64-bit value with low-order 32 bits equal to integer part of srcl rounded 

fid.y srcl (srcl), freg Floating-Point Load (Normal) 

fid.y srcl (srcl)++ , freg Floating-Point Load (Autoincrement) 

freg -4 — mem.y (srcl + srcl) 

IF autoincrement 

THEN srcl A — srcl + srcl 

FI 

flush #const(srcl) Cache Flush (Normal) 

flush #const(srcl)+-^ Cache Flush (Autoincrement) 

Replace block in data cache with address (# const + srcl). 

Contents of block undefined. 

IF autoincrement 

THEN srcl A — #const + srcl 

FI 

fmlow.p srcl , srcl, rdest Floating-Point Multiply Low 

rdest A — low-order 53 bits of srcl mantissa X srcl mantissa 
rdest bit 53 A — most significant bit of mantissa 

fmov.p srcl , rdest Floating-Point Reg-Reg Move 

Assembler pseudo-operation 

fmov.ss srcl , rdest = fiadd.ss srcJ , fO, rdest 

fmov.dd srcl, rdest = fiadd.dd srcl ,fO, rdest 

fmov.sd srcJ , rdest = fadd.sd srcl ,iO, rdest 

fmov.ds srcl, rdest = fadd.ds srcl ,iO, rdest 

fmul.p srcl; srcl, rdest Floating-Point Multiply 

rdest -4 — srcl X srcl 

fnop Floating-Point No Operation 

Assembler pseudo-operation 
fnop = shrd rO, rO, rO 

form srcl, rdest OR with MERGE Register 

rdest ^^— srcl OR MERGE 
MERGE <— 
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frcp.p src2, rdest Floating-Point Reciprocal 

rdest A — 1 I src2 with maximum mantissa error < 2~^ 

frsqr.p src2 , rdes t Floating-Point Reciprocal Square Root 

rdest A — 1 / W src2 with maximum mantissa error < 2~^ 

fst.y freg, srcl{src2) Floating-Point Store (Normal) 

fst.y freg, srcl{src2)++ Floating-Point (Autoincrement) 

mem.y (src2 + srcl) -^—freg 

IF autoincrement 

THEN src2 M — srcj + src2 

FI 

fsub.p srcl , src2, rdest Floating-Point Subtract 

rdest A — srcJ — src2 

ftrunc.p srcl, rdest Floating-Point to Integer Conversion 

rdest -4 — 64-bit value with low-order 32 bits equal to integer part of srcl 

fxfr srcl , ireg Transfer F-P to Integer Register 

ireg M — srcl 

fzchkl srcl, src2, rdest 32-Bit Z-Buffer Check 

Consider srcl , src2 , and rdest as arrays of two 32-bit 

fields srcl {Q).. srcl {I), src2(0)..src2{\), and rdest(0)..rdest(\) 

where zero denotes the least-significant field. 
PM <— PM shifted right by 2 bits 
FOR i = to 1 
DO 

PM [i + 6] -4 — src2(i) ^ srcl(i) (unsigned) 

rdestii) A — smaller of -S'rc2(i) and srcl{\) 
OD 
MERGE <— 

fzchks srcl, src2, rdest 16-Bit Z-Buffer Check 

Consider srcl , src2, and rdest as arrays of four 16-bit 

fields 5rc7(0)..5rc/ (3), src2{Qi)..src2{3), 2indrdest{0)..rdest{?)) 

where zero denotes the least-significant field. 
PM -<— PM shifted right by 4 bits 
FOR i = to 3 
DO 

PM [i + 4] < — src2i\) ^ srcl{\) (unsigned) 

rdest(i) 4 — smaller of 5'rc2(i) and srcl(i) 
OD 
MERGE ^4— 

intovr Software Trap on Integer Overflow 

If OF = 1, generate trap with IT set in psr 
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ixfr srcini, freg Transfer Integer to F-P Register 

freg A — srcini 

Id.c ctrlreg, rdest Load from Control Register 

rdest A — ctrlreg 

Id.x srcl(src2), rdest Load Integer 

rdest M — mem.x (srcl + src2) 

lock Begin Interlocked Sequence 

Set BL in dirbase. The next load or store that misses the cache locks the bus. 
Disable interrupts until the bus is unlocked. 

mov src2, rdest Register-Register Move 

Assembler pseudo-operation 

mov srcl, rdest = shI rO, src2, rdest 

nop Core-Unit No Operation 

Assembler pseudo-operation 
nop = shI rO, rO, rO 

or srcl , srcl, rdest Logical OR 

rdest "4 — srcl OR srcl 

CC set if result is zero, cleared otherwise 

orh #const, srcl, rdest Logical OR high 

rdest A — {# const shifted left 16 bits) OR srcl 
CC set if result is zero, cleared otherwise 

pfadd.p srcl , srcl, rdest Pipelined Floating-Point Add 

rdest 4 — last A-stage result 

Advance A pipeline one stage 

A pipeline first stage A — srcl + srcl 

pfaddp srcl , srcl, rdest Pipelined Add with Pixel Merge 

rdest A — last-stage I-result 

last-stage I-result -4 — srcl + srcl 

Shift and load MERGE register from srcl + srcl as defined in Table A-1 

pfaddz srcl , srcl, rdest Pipelined Add with Z Merge 

rdest -^ — last-stage I-result 

last-stage I-result -^ — srcl + srcl 

Shift MERGE right 16 and load fields 31.. 16 and 63.. 48 from srcl + srcl 

pfam.p srcl , srcl, rdest Pipelined Floating-Point Add and Multiply 

rdest A — last A-stage result 

Advance A and M pipeline one stage (operands accessed before advancing pipeline) 

A pipeline first stage ■^— A-opl + A-op2 

M pipeline first stage < — M-opl X M-op2 
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pfeq.p srcl , src2, rdest Pipelined Floating-Point Equal Compare 

rdest A — last A-stage result 

CC set \i srcl = src2, else cleared 

Advance A pipeline one stage 

A pipeline first stage is undefined, but no result exception occurs 

pfgtp srcl, src2, rdest Pipelined Floating-Point Greater-Than Compare 

(Assembler clears R-bit of instruction) 

rdest M — last A-stage result 

CC set if srcl > srcl , else cleared 

Advance A pipeline one stage 

A pipeline first stage is undefined, but no result exception occurs 

pfiadd.w srcl , srcl, rdest Pipelined Long- Integer Add 

rdest A — last-stage I-result 
last-stage I-result A — srcl + srcl 

pfisub.w srcl , srcl, rdest Pipelined Long-Integer Subtract 

rdest A — last-stage I-result 
last-stage I-result A — srcl — srcl 

pfix.p srcl , rdest Pipelined Floating-Point to Integer Conversion 

rdest A — last A-stage result 
Advance A pipeline one stage 

A pipeline first stage A — 64-bit value with low-order 32 bits 
equal to integer part of srcl rounded 

Pipelined Floating-Point Load 

pfld.z srcl {srcl), freg (Normal) 

pfld.z srcl (srcl)++ , freg (Autoincrement) 

freg A — mem.z (third previous pfld's (srcl + srcl)) 

(where .z is precision of third previous pfld.z) 
IF autoincrement 
THEN srcl A — srcl + srcl 
FI 

pfle.p srcl , srcl, rdest Pipelined F-P Less-Than or Equal Compare 

Assembler pseudo-operation, identical to pfgtp except that 

assembler sets R-bit of instruction. 
rdest A — last A-stage result 
CC clear if srcl ^ srcl , else set 
Advance A pipeline one stage 
A pipeline first stage is undefined, but no result exception occurs 

pfmam.p srcl , srcl, rdest Pipelined Floating-Point Add and Multiply 

rdest A — last M-stage result 

Advance A and M pipeline one stage (operands accessed before advancing pipeline) 

A pipeline first stage A — A-opl + A-op2 

M pipeline first stage A — M-opl X M-op2 
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pfmov.p srcl , rdest 

Assembler pseudo-operation 

pfmov.ss srcl , rdest = pfiadd.ss srcl ,10, rdest 
pfmov.dd srcl, rdest = pfladd.dd srcl ,iO, rdest 
pfmov.sd srcl, rdest = pfadd.sd srcl ,iO, rdest 
pfmov.ds srcl, rdest = pfadd.ds srcl ,fO, rdest 



Pipelined Floating-Point Reg-Reg Move 



pfmsm.p srcl, src2, rdest Pipelined Floating-Point Subtract and Multiply 

rdest A — last M-stage result 

Advance A and M pipeline one stage (operands accessed before advancing pipeline) 

A pipeline first stage A — A-opl — A-op2 

M pipeline first stage -^ — M-opl x M-op2 



pfmul.p srcl, src2, rdest 

rdest -4^- last M-stage result 

Advance M pipeline one stage 

M pipeline first stage -4 — srcl x src2 

pfmulS.p srcl, src2, rdest 

rdest A — last M-stage result 
Advance 3-Stage M pipeline one stage 
M pipeline first stage -4 — srcl x src2 

pform srcl , rdest 

rdest -4 — last-stage 1-result 

last-stage I-result 4— srcl OR MERGE 

MERGE 4— 



.Pipelined Floating-Point Multiply 



Three-Stage Pipelined Multiply 



Pipelined OR to MERGE Register 



pfsm.p srcl, src2, rdest Pipelined Floating-Point Subtract and Multiply 

rdest 4 — last A-stage result 

Advance A and M pipeline one stage (operands accessed before advancing pipeline) 

A pipeline first stage 4 — A-opl — A-op2 

M pipeline first stage 4 — M-opl x M-op2 



pfsub.p srcl, src2, rdest . 

rdest 4 — last A-stage result 
Advance A pipeline one stage 
A pipeline first stage 4 — srcl 



.Pipelined Floating-Point Subtract 



src2 



pftrunc.p srcl , rdest 

rdest 4 — last A-stage result 
Advance A pipeline one stage 

A pipeline first stage 4 — 64-bit value with low-order 32 bits 
equal to integer part of srcl 



Pipelined Floating-Point to Integer Conversion 



pfzchkl srcl , src2, rdest 

Consider srcl , srcl , and rdest as arrays of two 32-bit 

fields srcl(0)..srcl{\), src2{0)..src2i\), and rdestiO)..rdesti\) 
where zero denotes the least-significant field. 



PipeUned 32-Bit Z-Buffer Check 
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PM -4— PM shifted right by 2 bits 

FOR i = to 1 

DO 

PM [i + 6] '4—src2{i) ^ srcl(i) (unsigned) 

rdestix) -^ — last-stage I-result 

last-stage I-result -4 — smaller of 5rc2(i) and srcl{\) 
OD 
MERGE ^4—0 

pfzchks srcl, src2, rdest Pipelined 16-Bit Z-Buffer Check 

Consider sn-l , srcl , and rdest as arrays of four 16-bit 

fields .vr(7(0)...S7Y7(3), src2{Q)..src2{l>), and rdest(0)..rdesti3) 

where zero denotes the least-significant field. 
PM <— PM shifted right by 4 bits 
FOR i = to 3 
DO 

PM [i + 4] ^^— src2{\) ^ .sT(7(i) (unsigned) 

rdest -4 — last-stage I-result 

last-stage I-result(i) A — smaller of .vrr2(i) and srcl(i) 
OD 
MERGE <— 

pst.d freg, #constisrc2) Pixel Store 

pst.d frcg, # const {src2)-\--\- Pixel Store Autoincrement 

Pixels enabled by PM in mem.d {src2 + #const) A — freg 

Shift PM right by 8/pixel size (in bytes) bits 

IF autoincrement THEN src2 A — #const + src2 FI 

shI srcl , src2, rdest Shift Left 

rdest A — src2 shifted left by srcl bits 

shr srcl , src2, rdest Shift Right 

SC (in psr) <— srcl 

rdest -4 — src2 shifted right by srcl bits 

shra srcl , src2, rdest Shift Right Arithmetic 

rdest M — src2 arithmetically shifted right by srcl bits 

shrd srcJni, src2, rdest Shift Right Double 

rdest -4 — low-order 32 bits of srclni:src2 shifted right by SC bits 

st.c srclni, ctrlreg Store to Control Register 

ctrlreg 4 — srclni 

st.x srclni, if=const{src2) Store Integer 

mem.x (src2 + # const) -4 — srclni 

subs srcl , src2, rdest Subtract Signed 

rdest -4 — srcl — src2 

OF -4 — (bit 3 1 carry ^ bit 30 carry) 
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CC set if src2 > srcl (signed) 
CC clear if src2 ^ srcl (signed) 

subu srcl , srcl, rdest Subtract Unsigned 

rdest ■4— srcl — srcl 
OF ^4- NOT (bit 31 carry) 
CC ^4- bit 31 carry 
(i.e. CC set if srcl ^ srcl (unsigned) 
CC clear if srcl > srcl (unsigned)) 

trap srcl , srcl, rdest Software Trap 

Generate trap with IT set in psr 

unlock End Interlocked Sequence 

Clear BL in dirbase. The next load or store that misses the cache unlocks the bus. 

xor srcl, srcl, rdest Logical Exclusive OR 

rdest 4— srcl XOR srcl 

CC set if result is zero, cleared otherwise 

xorh # const, srcl, rdest Logical Exclusive OR High 

rdest <— {#const shifted left 16 bits) XOR srcl 
CC set if result is zero, cleared otherwise 
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Appendix B 
Instruction Format and Encoding 



All instructions are 32 bits long and begin on a four-byte boundary. Among the core instructions, 
there are two general formats: REG-format and CTRL-format. Within the REG-format are several 
variations. 



REG-Format Instructions 





31 25 


General Format 
20 15 10 






OPCODE/I 


SRC2 


DEST 


SRC1 


nul l/lmmedlate/off set 






16-Bit Immediate Variant (except bte and btne) 
31 25 20 15 






OPCODE 


1 


SRC2 


DEST 


IMMEDIATE 
CONSTANT OR ADDRESS OFFSET 






31 25 


St, bla, bte and btne 
20 15 10 






OPCODE/I 


SRC2 


OFFSET 
HIGH 


SRC1 
SRC1S 


OFFSET LOW 






31 25 


bte and btne with 5-Bit Immediate 
20 15 10 






OPCODE 


1 


SRC2 


OFFSET 
HIGH 


IMMEDIATE 


OFFSET LOW 
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The src2 field selects one of the 32 integer registers (most instructions) or one of the control 
registers (st.c andld.c). Dest selects one of the 32 integer registers (most instructions) or floating- 
point registers (fid, fst, pfid, pst, Ixfr). For instructions where srcl is optionally an immediate 
constant or address offset, bit 26 of the opcode (I-bit) indicates whether srcl is immediate. If bit 
26 is clear, an integer register is used; if bit 26 is set, srcl is contained in the low-order 16 bits, 
except forbte andbtne instructions. Forbte andbtne, the five-bit immediate constant is contained 
in the srcl field. For st, bte, btne, and bla, the upper five bits of the offset or brojfset are 
contained in the dest field instead of srcl , and the lower 1 1 bits of ojfset are the lower 1 1 bits of 
the instruction. 

For Id and St, bits 28 and zero determine operand size as follows: 



Bit 28 


BitO 


Operand Size 





1 
1 



1 


1 


8-bits 

8-bits 

16-bits 

32-bits 



When srcl is immediate and bit 28 is set, bit zero of the immediate value is forced to zero. 

For fid, fst, pfId, pst, and flush, bit selects autoincrement addressing if set. Bits one and two 
select the operand size as follows: 



Bit 1 


Bit 2 


Operand Size 




1 
1 



1 

1 


64-bits 

128-bits 

32-bits 

32-bits 



When srcl is immediate, bits zero and one of the immediate value are forced to zero to maintain 
alignment. When bit one of the immediate value is clear, bit two is also forced to zero. 
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REG-Format Opcodes 







31 










26 


Id.x 


Load Integer 











L 





I 


st.x 


Store Integer 











L 


1 


1 


ixfr 


Integer to F-P Reg Transfer 














1 







(reserved) 











1 


1 





fld.x, fst.x 


Load/Store F-P 








1 





LS 


I 


flush 


Flush 








1 


1 





1 


pst.d 


Pixel Store 








1 


1 


1 


1 


Id.c, st.c 


Load/Store Control Register 








1 


1 


LS 





bri 


Branch Indirect 



















trap 


Trap 
















1 




(Escape for F-P Unit) 













1 







(Escape for Core Unit) 













1 


1 


bte, btne 


Branch Equal or Not Equal 










1 


E 


I 


pfld.y 


Pipelined F-P Load 







1 








I 




(CTRL-Format Instructions) 







1 


X 


x 


x 


addu, -s, subu, -s, 


Add/Subtract 










so 


AS 


I 


shI, shr 


Logical Shift 







1 





LR 


I 


shrd 


Double Shift 







1 


1 








bla 


Branch LCC Set and Add 







1 


1 





1 


shra 


Arithmetic Shift 







1 


1 


1 


I 


and(h) 


AND 












H 


I 


andnot(h) 


ANDNOT 









1 


H 


I 


or(h) 


OR 






1 





H 


I 


xor(h) 


XOR 






1 


1 


H 


I 




(reserved) 






X 


x 


1 






L Integer Length 

—8 bits 

1 —16 or 32 bits (selected by bit 0) 
LS Load/Store 

—Load 

1 — Store 
SO Signed/Ordinal 

—Ordinal 

1 — Signed 
H High 

— and, or, andnot, xor 

1 — andh, orh, andnoth, xorh 



AS Add/Subtract 

—Add 

1 — Subtract 
LR Left/Right 

—Left Shift 

1 —Right Shift 
E Equal 

— Branch on Not Equal 

1 — Branch on Equal 
I Immediate 

— srcl is register 

1 — srcl is immediate 



Core Escape Instructions 





31 26 15 10 5 






010011 


reserved 


SRC1 


reserved 


OPCODE 
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Core Escape Opcodes 







4 













(reserved) 

















lock 


Begin Interlocked Sequence 














1 


call! 


Indirect Subroutine Call 











1 







(reserved) 











1 


1 


intovr 


Trap on Integer Overflow 








1 










(reserved) 








1 





1 




(reserved) 








1 


1 





unlock 


End Interlocked Sequence 








1 


1 


1 




(reserved) 





1 


X 


X 


X 




(reserved) 


1 





X 


X 


X 




(reserved) 


1 


1 


X 


X 


X 



CTRL-Format Instructions 





31 28 25 






01 1 


OPC 


BROFFSET 











BROFFSET is a signed 26-bit relative branch offset. 



CTRL-Format Opcodes 



28 



Taken 

— be or bnc 

1 — bc.t or bnc.t 



26 



br Branch Direct 
call Call 
bc(.t) Branch on CC Set 
bnc(.t) Branch on CC Clear 





1 
1 


1 
1 

1 



1 

T 
T 
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Floating-Point Instruction Encoding 


















31 25 20 15 


7 









010010 


SRC2 


DEST 


SRC1 


P 


D 


S 


R 


OPCODE 















SRCl, SRC2 — Source; one of 32 floating-point registers 
DEST — Destination register 

(instructions other tfian fxfr) one of 32 floating-point registers 

(fxfr) one of 32 integer registers 



Pipelining 

1 — Pipelined instruction mode 

— Scalar instruction mode 
Dual-Instruction Mode 

1 — Dual-instruction mode 
— Single-instruction mode 



Source Precision 

1 — Double-precision source operands 

— Single-precision source operands 
Result Precision 

1 — Double-precision result 
— Single-precision result 



Floating-Point Opcodes 







6 















pfam 


Add and Multiply* 













DPC 




pfmam 


Multiply with Add* 








pfsm 


Subtract and Multiply* 








1 




DPC 




pfmsm 


Multiply with Subtract* 


1 










(p)fmul 


Multiply 






















fmlow 


Multiply Low 



















1 


frcp 


Reciprocal 
















1 





frsqr 


Reciprocal Square Root 
















1 


1 


pfmulS.dd 


3-Stage Pipelined Multiply 













1 








(p)fadd 


Add 







1 














(p)fsub 


Subtract 







1 











1 


(p)fix 


Fix 







1 








1 





pfgt/pfle** 


Greater Than 







1 





1 








pfeq 


Equal 







1 





1 





1 


(p)ftrunc 


Truncate 







1 


1 





1 





fxfr 


Transfer to Integer Register 






















(p)fiadd 


Long-Integer Add 










1 








1 


(p)fisub 


Long-Integer Subtract 










1 


1 





1 


(p)fzchkl 


Z-Check Long 







1 





1 


1 


1 


(p)fzchks 


Z-Check Short 







1 


1 


1 


1 


1 


(p)faddp 


Add with Pixel Merge 







1 














(p)faddz 


Add with Z Merge 







1 











1 


(p)form 


OR with MERGE Register 







1 


1 





1 






*pfam and pfsm have P-bit set; pfmuladd and pfmulsub have P-bit clear. 
''*pfgt has R bit cleared; pfle has R bit set. 
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Appendix C 
Instruction Timings 



i860 Microprocessor instructions take one clock to execute unless a freeze condition is invoked. 
Freeze conditions and their associated delays are shown in the table below. Freezes due to multiple 
simultaneous cache misses result in a delay that is the sum of the delays for processing each miss 
by itself. Other multiple freeze conditions usually add only the delay of the longest individual 
freeze. 



Freeze Condition 



Delay 



Instruction-cache miss 



Number of clocks to read instruction (from 
ADS clock to first READY# clock) plus 
time to last READY# of block when jump 
or freeze occurs during miss processing plus 
two clocks if data cache being accessed when 
instruction-cache miss occurs. 



Reference to destination of load instruction 
that misses 



fid miss 



One plus number of clocks to read data (from 
ADS clock to first READY# clock) minus 
number of instructions executed since load 
(not counting instruction that references load 
destination) 

One plus number of clocks from ADS to first 
READY 



call/calli/ixfr/fxfr/ld.c/st.c and data cache miss 
processing in progress 

Id/st/pfld/fld/fst and data cache miss process- 
ing in progress 

Reference to dest of Id, call, calli, fxfr, or 

Id.c in the next instruction 



One plus number of clocks until first READY 
returned 

One plus number of clocks until last READY 
returned 

One clock 



Reference to dest of fid/pfld/lxfr in the next 
two instructions 



Two clocks in the first instruction; one in the 
second instruction 



continued 
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Freeze Condition 



Delay 



bc/bnc/bc.t/bnc.t following 
addu/adds/subu/subs/pfeq/pfgt 

Src] of multiplier operation refers to result of 
previous operation 

Floating-point operation or fst and scalar 
operation in progress other than f rep orfrsqr 



Multiplier operation preceded by a double- 
precision multiply 

TLB miss 



pfid when three pfld's are outstanding 

pfid hits in the data cache 

Store pipe full (two internal plus outstanding 
bus cycles) andst/fst miss, Id miss, or flush 
with modified block 

Address pipe full (one internal plus outstanding 
bus cycles) and Id/fld/pfld/st/fst 

Id/fid following st/fst hit 

Delayed branch not taken 

Nondelayed branch taken: 
bc,bnc 
bte, btne 

Branch indirect bri 



One clock 



One clock 



If the scalar operation is fadd, fix, fmlow, 
fmul.ss, fmul.sd, ftrunc, orfsub, three minus 
the number of instructions executed after the 
scalar operation. If the scalar operation is 
f mul.dd , four minus the number of instructions 
executed after it. Add one if the precision of 
the result of the previous scalar operation is 
different than that of the source. Add one if 
the floating-point operation is pipelined and 
its destination is not fO. If the sum of the 
above terms is negative, there is no delay. 

One clock 



Five plus the number of clocks to finish two 
reads plus the number of clocks to set A-bits 
(if necessary) 

One plus the number of clocks to return data 
from first pfid 

Two plus the number of clocks to finish all 
outstanding accesses 

One plus the number of clocks until READY# 
active on next write data 



Number of clocks until next address can be 
issued 

One clock 

One clock 



One clock 
Two clocks 

One clock 

continued 



C-2 



inteT 



INSTRUCTION TIMINGS 



Freeze Condition 



Delay 



st.c Two clocks 

Result of graphics-unit instruction (other than One clock 

fmov) used in next instruction when the next 
instruction is an adder or multiplier instruction 

Result of graphics-unit instruction used in One clock 

next instruction when the next instruction is 
a graphics-unit instruction 

flush followed by flush Two clocks 

fst followed by pipelined floating-point op- One clock 

eration that overwrites the register being 

stored 
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Appendix D 
Instruction Characteristics 



The following table lists some of the characterisics of each instruction. The characteristics are: 

• What processing unit executes the instruction. The codes for processing units are: 

A Floating-point adder unit 

E Core execution unit 

G Graphics unit 

M Floating-point multiplier unit 

• Whether the instruction is pipelined or not. A P indicates that the instruction is pipelined. 

• Whether the instruction is a delayed branch instruction. AD marks the delayed branches. 

• Whether the instruction changes the condition code CC. A CC marks those instructions that 
change CC. 

• Which faults can be caused by the instruction. The codes used for exceptions are: 

IT Instruction Fault 

SE Floating-Point Source Exception 

RE Floating-Point Result Exception, including overflow, underflow, inexact result 

DAT Data Access Fault 

Note that this is not the same as specifying at which instructions faults may be reported. 
A fault is reported on the subsequent floating-point instruction plus pst, fst, and sometimes 
fid, pfid, and ixfr. 

The instruction access fault lAT and the interrupt trap IN are not shown in the table because 
they can occur for any instruction. 

• Performance notes. These comments regarding optimum performance are recommendations 
only. If these recommendations are not followed, the i860 Microprocessor automatically 
waits the necessary number of clocks to satisfy internal hardware requirements. The following 
notes define the numeric codes that appear in the instruction table: 

1 . The following instruction should not be a conditional branch (be, bnc, bet, or bnc.t). 

2. The destination should not be a source operand of the next two instructions. 

3. A load should not directly follow a store that is expected to hit in the data cache. 

4. When the prior insti"uction is scalar, srcl should not be the same as the rdest of the prior 
operation. 
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5. The freg should not reference the destination of the next instruction if that instuction is a 
pipelined floating-point operation. 

6. The destination should not be a source operand of the next instruction. 

7. When the prior operation is scalar and multiplier opl is srcl , src2 should not be the 
same as the rdest of the prior operation. 

8. When the prior operation is scalar, srcl and srcl of the current operation should not be 
the same as rdest of the prior operation. 

Programming restrictions. These indicate combinations of conditions that must be avoided by 
programmers, assemblers, and compilers. The following notes define the alphabetic codes 
that appear in the instruction table: 

a. The sequential instruction following a delayed control-transfer instruction may not be 
another control-transfer instruction, nor a trap instruction, nor the target of a control- 
transfer instruction. 

b. When using a bri to return from a trap handler, programmers should take care to prevent 
traps from occurring on that or on the next sequential instruction. IM should be zero 
(interrupts disabled) when the bri is executed. 

c. \i rdest is not zero, srcl must not be the same as rdest. 

d. When the multiplier opl is srcl , srcl must not be the same as rdest. 

e. If rdest is not zero, srcl and src2 must not be the same as rdest. 
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Instruction 


Execution 
Unit 


Pipelined? 
Delayed? 


Sets 
CC? 


Faults 


Performance 
Notes 


Programming 
Restrictions 


adds 

addu 

and 

andh 

andnot 

andnoth 

be 

bet 

bla 

bnc 

bnc.t 

br 

bri 

bte 

btne 


E 
E 
E 
E 
E 
E 
E 
E 
E 
E 
E 
E 
E 
E 
E 


D 
D 

D 
D 
D 


CC 
CC 
CC 
CC 
CC 
CC 




1 
1 


a 
a 

a 
a 
a, b 


call 

calli 

fadd.p 

faddp 

faddz 

fiadd.w 

fisub.w 

fix.p 

fId.y 

flush 

fmlow.p 

fmul.p 

form 


E 
E 
A 
G 
G 
G 
G 
A 
E 
E 
M 
M 
G 


D 
D 




SE, RE 

SE, RE 
DAT 

SE, RE 


2 
2 

8 
8 
8 
8 

2,3 

4 
4 
8 


a 
a 


frcp.p 

frsqrp 

fst.y 

fsub.p 

ftrunc.p 

fxfr 

fzchkl 

fzchks 

intovr 

ixfr 

Id.c 

Id.x 

lock 

or 

orh 


M 
M 

E 
A 
A 
G 
G 
G 
E 
E 
E 
E 
E 
E 
E 




CC 
CC 


SE, RE 
SE, RE 
DAT 
SE, RE 
SE, RE 

IT 
DAT 


5 

6,8 

8 

8 

2 

6 





D-3 



irrteT 



INSTRUCTION CHARACTERISTICS 



ln<^truptinn 


Execution 


Piplined? 


Sets 


F^i iltQ 


Performance 


Programming 


II lOLI LJOIIWI 1 


Unit 


Delayed? 


CC? 


1 ClUHO 


Notes 


Restrictions 


pfadd.p 


A 


P 




SE, RE 






pfaddp 


G 


P 






8 


e 


pfaddz 


G 


P 






8 


e 


pfam.p 


A&M 


P 




SE, RE 


7 


d 


pfeq.p 


A 


P 


CC 


SE 


1 




pfgt.p 


A 


P 


CC 


SE 


1 




pfiadd.w 


G 


P 






8 


e 


pfisub.w 


G 


P 






8 


e 


pfix.p 


A 


P 




SE, RE 






pfld.z 


E 


P 






2 




pfmam.p 


A&IVl 


P 




SE, RE 


7 


d 


pfmsm.p 


A&M 


P 




SE, RE 


7 


d 


pfmul.p 


IVl 


P 




SE, RE 


4 


c 


pfmul3.dd 


IVI 


P 




SE, RE 


4 


c 


pform 


G 


P 






8 


e 


pfsm.p 


A&M 


P 




SE, RE 


7 


d 


pfsub.p 


A 


P 




SE, RE 






pftrunc.p 


A 


P 




SE, RE 






pfzchkl 


G 


P 






8 




pfzchks 


G 


P 






8 




pst.d 


E 






DAT 






shI 


E 












shr 


E 












shra 


E 












shrd 


E 












st.c 


E 












st.x 


E 






DAT 






subs 


E 




CC 




1 




subu 


E 




CC 




1 




trap 


E 






IT 






unlock 


E 












xor 


E 




CC 








xorh 
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DOMESTIC DISTRIBUTORS 



Arrow Electronics, Inc. 
1015 Henderson Road 
Huntsvllle 35805 
Tel: (205) 837-6955 

tHamilton/Avnet Electronics 
4940 Research Drive 
Huntsvllle 35805 
Tel: (205) 837-7210 
TW)(: 810-726-2162 



TWX: 810-726-2197 



tHamilton/Avnet Electronics 
505 S. Madison Drive 
Temps 85281 
Tel: fe02) 231-5140 
TW)<: 910-950-0077 

Hamilton/Avnet Electronics 
30 Soutti MoKlemy 
Chandler 85226 
Tel: (602) 961-6669 
TWX: 910-950-0077 

Arrow Electronics, Inc. 
4134 E. Wood Street 
Phoenix 85040 
Tel: (602) 437-0750 
TWX: 910-951-1550 

Wyle Distribution Group 
17855 N. Black Canyon Hwy. 
Phoenix 85023 
Tel: (602) 249-2232 
TWX: 910-951-4282 

CALIFORNIA 

Arrow Electronics, Inc. 
10824 Hope Street 
Cypress 90630 
Tef (714) 220-6300 

Arrow Electronics, Inc. 
19748 Dearborn Street 
Chatsworth 91311 
Tel: (213) 701-7500 
TWX: 91 0-493-2086 

tArow Electronics, Inc. 
521 Weddell Drive 
Sunnyvale 94086 
Tel: (408) 745-6600 
TWX: 910-339-9371 

Arrow Electronics, Inc. 
951 1 Ridgehaven Court 
San Diego 92123 
Tel: (619) 565-4800 



tArrow Electronics, Inc. 
2961 Dow Avenue 
Tustin 92680 
Tel: (714)838-5422 
TWX: 910-595-2860 

fAvnet Electronics 
350 McCormick Avenue 
Costa Mesa 92626 
Tel: (714) 754-6071 
TWX: 910-595-1928 

tHamilton/Avnet Electronics 
1175 Bordeaux Drive 
Sunnyvale 94086 
Tel: (408) 743-3300 
TWX: 910-339-9332 

tHamilton/Avnet Electronics 
4545 Ridgevlew Avenue 
San Diego 921 23 
Tel: (613)671-7500 
TWX: 910-595-2638 

tHamilton/Avnet Electronics 
9650 Desoto Avenue 
Chatsworth 91311 
Tel: (818) 700-1 161 



Culver City 20230 
Tel: (213) 558-2458 
TWX: 910-340-6364 

Hamilton Electro Sales 
1 361 B West 190th Street 
Gardena 90248 
Tel: (213) 217-6700 

tHamilton/Avnet Electronics 
3002 'G' Street 
Ontario 91761 
Tel: (714) 989-9411 

tAvnet Electronics 
20601 Plummer 
Chatsworth 91351 
Tel: (213) 700-6271 
TWX: 910-494-2207 



CALIFORNIA (Cont'd.) 

tHamilton Electro Sales 
3170 Pullman Street 
Costa Mesa 92626 
Tel: (714) 641-4150 
TWX: 910-595-2638 

tHamilton/Avnet Electronics 
4103 Northgate Blvd. 
Sacramento 95834 
Tel: (916) 920-3150 

Wyle Distribution Group 
124 Maryland Street 
El Segundo 90254 
Tel: (213) 322-8100 



TWX: 910-348-7140 or 7111 

Wyle Distribution Group 
11151 Sun Center Drive 
Rancho Cordova 95670 
Tel: (916) 638-5282 

tWyle Distribution Group 
9525 Chesapeake Drive 
San Diego 921 23 
Tel: (619) 565-9171 
TWX: 910-335-1590 

tWyle Distribution Group 
3000 Bowers Avenue 
Santa Clara 95051 
Tel: (408) 727-2600 
TWX: 910-338-0296 

tWyle Distribution Group 
17872 Cowan Avenue 
Irvine 92714 
Tel: (71 4) 863-9953 
TWX: 910-595-1572 

Wyle Distribution Group 
26677 W.AqouraRd. 
Calabasas 91302 
Tel: (818) 880-9000 
TWX: 372-0232 

COLORADO 

Arrow Electronics, inc. 
7060 South Tucson Way 
Englewood 80112 
Tel: (303) 790-4444 

tHamilton/Avnet Electronics 
8765 E. Orchard Road 
Suite 708 
Englewood 801 1 1 
Tel: (303) 740-1017 
TWX: 910-935-0787 

tWyle Distribution Group 
451 E. 124th Avenue 
Thornton 80241 
Tel: (303) 457-9953 
TW)(: 910-936-0770 

CONNECTICUT 

tArrow Electronics, Inc. 
12 Beaumont Road 
Wallingford 06492 
Tel: (203) 265-7741 
TWX: 710476-0162 

Hamilton/Avnet Electronics 
Commerce Industrial Park 
Commerce Drive 
Danbury 06810 
Tel: (203) 797-2800 
TWX: 710-456-9974 

tPioneer Electronics 
112 Main Street 
Nonwalk 06861 
Tel: (203) 853-1515 
TWX: 710-468-3373 



Suite 102 
Deerfield Beach 33441 
Tel: (305) 429-8200 
TWX: 510-955-9456 

Arrow Electronics, Inc. 
37 Skyline Drive 
Suite 3101 
Lake Marv 32746 
Tel: (407) 323-0252 
TWX: 510-959-6337 

tHamilton/Avnet Electronics 
6801 N.W. 15th Way 
Ft. Lauderdale 333()9 
Tel: (305) 971-2900 
VNX: 510-956-3097 

tHamilton/Avnet Electronics 
31 97 Tech Drive North 
St. Petersburg 33702 



FLORIDA (Cont'd.) 

tHamilton/Avnet Electronics 
6947 University Boulevard 
Winter Park 32792 
Tel: (305) 628-3888 
TWX: 810-853-0322 

tPioneer /Technologies Group, Inc. 

337 S. Lake Blvd. 

Alta Monte Springs 32701 

Tel: (407) 834-9090 

TWX: 810-853-0284 

Pioneer/Technologies Group, Inc. 
674 S. Military Trail 
Deerfield Beach 33442 
Tel: (305) 428-8877 
TWX: 510-955-9653 

GEORGIA 

tArrow Electronics, Inc. 
3155 Northwoods Parkway 
Suite A 

Noroross 30071 
Tel: (404) 449-8252 
TWX; 810-766-0439 

tHamilton/Avnet Electronics 
5825 D Peachtree Corners 
Norcross 30092 
Tel: (404) 447-7500 
TWX: 810-766-0432 

Pioneer/Technologies Group, Inc. 
3100 F Northwoods Place 
Norcross 30071 
Tel: (404) 448-1711 
TWX: 810-766-4615 



Arrow Electronics, Inc. 

1140W.Thorndale 

Itasca 60143 

Tel: (312) 250-0500 

TWX: 312-250-0916 

tHamilton/Avnet Electronics 
1 130 Thorndale Avenue 
Bensenville 60106 
Tel: (312) 860-7780 
TWX: 910-227-0060 

MTI Systems Sales 
1 1 00 W. Thorndale 
Itasca 60143 
Tel: (312) 773-2300 

tPioneer Electronics 
1551 Carmen Drive 
Elk Grove Village 60007 
Tel: (312)437-9680 
TW)<: 910-222-1834 

INDIANA 

tArrow Electronics, Inc. 
2495 Directors Row. Suite H 
Indianapolis 46241 
Tel: (317) 243-9353 
TWX: 810-341-3119 

Hamilton/Avnet Electronics 
485 Gradle Drive 
Carmel 46032 
Tel: (317)844-9333 
TWX: 810-260-3966 

tPioneer Electronics 
6408 Castleplace Drive 
Indianapolis 46250 
Tel: (317) 849-7300 
TWX: 810-260-1794 

IOWA 

Hamilton/Avnet Electronics 
915 33rd Avenue, S.W. 
Cedar Rapids 52404 
Tel: (31 9) 362-4757 



Arrow Electronics 

8208 Melrose Dr., Suite 210 

Lenexa 66214 

Tel: (913) 541-9542 

tHamilton/Avnet Electronics 
9219 Quivera Road 
Overland Park 66215 
Tel: (913) 888-8900 
TWX: 910-743-0005 

Pioneer /Tec Gr. 
10551 Lockman Rd. 
Lenexa 66215 
Tel: (913) 492-0500 



Hamilton/Avnet Electronics 
1051 D. Nevrton Park 
Lexington 4051 1 
Tel: (606) 259-1475 



MARYLAND 

Arrow Electronics, Inc. 
8300 Guilford Drive 
Suite H, River Center 
Columbia 21046 
Tel: (301) 995-0003 
TWX: 710-236-9005 

Hamilton/Avnet Electronics 
6822 Oak Hall Lane 
Columbia 21045 
Tel: (301) 995-3500 
TWX: 710-862-1861 

J Mesa Technology Corp. 
720 Patuxent Woods Dr. 
Columbia 21046 
Tel: (301) 290-8160 
TWX: 710-828-9702 

tPioneer/Technologies Group, Inc. 
9100 Gaither Road 
Gaithersburg 20877 
Tel: (301) 921-0660 
TWX: 710-828-0545 

MASSACHUSETTS 

Arrow Electronics, Inc. 
25 Upton Dr. 
Wilmington 01887 
Tel: (617) 935-5134 

tHamilton/Avnet Electronics 
10D Centennial Drive 
Peabody 01960 
Tel: (617) 531-7430 
TWX: 710-393-0382 

MTI Systems Sales 
83 Cambridge St. 
Burlington (fl813 

Pioneer Electronics 
44 Hartwell Avenue 
Lexington 02173 
Tel: (617) 861-9200 
TWX: 710-326-6617 



Arrow Electronics, Inc. 
755 Phoenix Drive 
Ann Arbor 48104 
Tel: (313) 971-8220 
TWX: 810-223-6020 

Hamilton/Avnet Electronics 
2215 29th Street S.E. 
Space A5 

Grand Rapids 49508 
Tel: (616) 243-8805 
TWX: 810-274-6921 

Pioneer Electronics 
4504 Broadmoor S.E. 
Grand Rapids 49508 
FAX: 616-698-1831 

tHamilton/Avnet Electronics 
32487 Schoolcraft Road 
Livonia 48150 
Tel: (313) 522-4700 
TWX: 810-282-8776 

tPioneer/Michigan 
13485 Stamford 
Livonia 48150 
Tel: (313) 525-1800 
TWX: 810-242-3271 



tArrow Electronics, Inc. 
5230 W. 73rd Street 
Edina 55435 
Tel: (612) 830-1800 
TWX: 910-576-3125 

tHamilton/Avnet Electronics 
12400 Whitewater Drive 
Minnetonka 55434 
Tel: (612) 932-0600 

tPioneer Electronics 
7625 Golden Triange Dr. 
Suite G 

Eden Prairi 55343 
Tel: (612) 944-3355 

MISSOURI 

tArrow Electronics, Inc. 
2380 Schuetz 
St. Louis 63141 
Tel: (314)567-6888 
TWX: 910-764-0882 

tHamilton/Avnet Electronics 
13743 Shoreline Court 
Earth City 63045 
Tel: (314) 344-1200 
TWX: 910-762-0684 



NEW HAMPSHIRE 

5 Arrow Electronics, Inc. 
Perimeter Road 
Manchester 03103 
Tel: (603) 668-6968 
TW)(: 710-220-1684 

tHamilton/Avnet Electronics 
444 E. Industrial Dhve 
Manchester 031 03 
Tel: (603) 624-9400 

NEW JERSEY 

tArrow Electronics, Inc. 
Four East Stow Road 
Unit 11 

Marlton 08053 
Tel: (609) 596-8000 
TWX: 710-897-0829 

J Arrow Electronics 
Century Drive 
Parsipanny 07054 
Tel: (201) 538-0900 

tHamilton/Avnet Electronics 
1 Keystone Ave., BIdg. 36 
Cherry Hill 08003 
Tel: (609) 424-01 10 
TWX: 710-940-0262 

tHamilton/Avnet Electronics 
10 Industrial 
Fairfield 07006 
Tel: (201) 575-5300 
TWX: 710-734-4388 

tMTI Systems Sales 
37 Kulick Rd. 
Fairfield 07006 
Tel: (201) 227-5552 

tPioneer Electronics 
45 Route 46 
Pinebrook 07058 
Tel: (201)575-3510 
TWX: 710-734-4382 

NEW MEXICO 

Alliance Electronics Inc. 
1 1030 Cochiti S.E. 
Albuquerque 87123 
Tel; (505) 292-3360 
TWX: 91 0-989-1 151 

Hamilton/Avnet Electronics 
2524 Baylor Drive S.E. 
Albuquerque 87106 
Tel: (505) 765-1500 
TWX: 910-989-0614 



Rochester 14623 



Arrow Electronics, Inc. 
20 Oser Avenue 
Hauppuge 11788 



Hamilton/Avnet 
933 Motor Parkway 
Hauppauge 11788 
Tel: (516) 231-9800 
TWX: 510-224-6166 

tHamilton/Avnet Electronics 
333 Metro Park 
Rochester 14623 
Tel: (716) 475-9130 
TWX: 510-253-5470 

tHamilton/Avnet Electronics 
103 Twin Oaks Drive 
Syracuse 13206 
Tel: (315) 437-0288 
TWX: 710-541-1560 

tMTI Systems Sales 
38 Harbor Park Drive 
Port Washington 11050 
Tel: (516) 621-6200 

tPioneer Electronics 
68 Corporate Drive 
Binghamton 13904 
Tel: (607) 722-9300 
TW)(: 510-252-0893 

Pioneer Electronics 
40 Oser Avenue 
Hauppauge 11787 
Tel: (516) 231-9200 
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DOMESTIC DISTRIBUTORS (Cont'd.) 



NEW YORK (Cont'd.) 


OKLAHOMA 


TEXAS (Cont'd.) 


WISCONSIN 


ONTARIO (Cont'd.) 


tPloneer Electronics 
60 Crossway Park West 


Arrow Electronics, Inc. 


tHamilton/Avnet Electronics 
2111 W. Walnut Hill Lane 


Arrow Electronics, Inc. 


tHamilton/Avnet Electronics 


1211 E. 51st Street 


200 N.Patrick Blvd., Ste. 100 


190 Colonnade Road South 


Woodbury, Long Island 11797 
Tel: (516)921-8700 


Suite 101 


Irving 75038 

Tel: (214) 550-6111 


Brookfield 53005 


Nepean K2E 7L5 
Tet(613) 226-1700 
TWX: 05-349-71 


Tulsa 74146 


Tel: (414) 767-6600 


TWX:510-221-2184 


Tel: (918) 252-7637 


TWX: 910-860-5929 


TWX: 910-262-1193 


tPioneer Electronics 
840 Fairport Park 


tHamilton/Avnet Electronics 
12121 E.SlstSt, Suitel02A 


tHamilton/Avnet Electronics 


Hamilton/Avnet Electronics 


tZentronics 


4850 Wright Rd., Suite 190 
Stafford 77477 


2975 Moorland Road 


8 Tilbury Court 


Fairport 14450 


Tulsa 74146 


New Berlin 53151 


Brampton L6T 3T4 
Tel: (416) 451-9600 
TWX: 06-976-78 


Tel: (71 6) 331 -7070 


Tel: (918) 252-7297 


Tel: (713) 240-7733 
TWX: 910-881-5523 


Tel: (414) 784-4510 


TWX: 510-253-7001 




TWX: 910-262-1182 




OREGON 








NORTH CAROLINA 




tPioneer Electronics 


CANADA 


tZentronics 




tAlmac Electronics Corp. 


18260 Kramer 


155 Colonnade Road 


tArrow Electronics, Inc. 


1886 N.W.I 69th Place 


Austin 78758 




Unit 17 


5240 Greensdairy Road 


Beaverton 97005 


Tel: (512) 835-4000 
TWX: 910-874-1323 


ALBERTA 


Nepean K2E 7K1 


Raleigh 27604 


Tel: (603) 629-8090 




Tel: (613) 226-8840 


Tel: (919)876-3132 


TWX: 910-467-8746 




Hamilton/Avnet Electronics 




TWX: 51 0-928-1 856 




tPioneer Electronics 


2816 21 St Street N.E. 


Zentronics 




tHamilton/Avnet Electronics 
6024 S.W. Jean Road 


13710 Omega Road 
Dallas 75234 


Calgary T2E6Z3 
Tel: (4(13) 230-3586 


60-1313 Border St. 


tHamilton/Avnet Electronics 


Winnipeg R3H 014 
Tel: (204) 694-7957 


351 Spring Forest Drive 
Raleigh 27S)4 


BIdg.C, Suite 10 


Tel: (214) 386-7300 
TWX: 910-850-5563 


TWX: 03-827-642 


Lake Oswego 97034 
Tel: (503i 635-7848 
TWX: 910-455-8179 






Tel: (319) 878-0819 




Zentronics 


QUEBEC 


TWX: 51 0-928-1 836 


tPioneer Electronics 


Bay No. 1 

33(]0 14th Avenue N E 








5853 Point West Drive 


tArrow Electronics Inc. 


Pioneer/Technologies Group, Inc. 
9801 A-Southern Pine Blvd. 


Wyle Distribution Group 


Houston 77036 


Calgary T2A 6J4 
Tel: (4(J3) 272-1021 


4050 Jean Talon Quest 


5250 N.E. Elam Young Parkway 


Tel: (713) 988-5555 


Montreal H4P 1W1 


Charlotte 28210 


Suite 600 


TWX: 910-881-1606 




Tel: (514) 735-5511 
TWX: 05-25590 


Tel: (919) 527-8188 


Hillsl)Oro97124 




BRITISH COLUMBIA 


TWX: 810-621-0366 


Tel: (503) 640-6000 


Wyle Distribution Group 
1810 Greenville Avenue 








TWX: 910-460-2203 


tHamilton/Avnet Electronics 
105-2550 Boundary 
Burmalay V5M 3Z3 


Arrow Electronics, Inc. 


OHIO 




Richardson 75081 


909 Charest Blvd. 




PENNSYLVANIA 


Tel: (214) 235-9953 


Quebec JIN 2C9 


Arrow Electronics, Inc. 






Tel: (604) 437-6667 


Tel: (418) 687-4231 
TWX: 05-13388 


7620 McEwen Road 


Arrow Electronics, Inc. 


UTAH 


Centerville 45459 


650 Seco Road 




Zentronics 




Tel: (513) 435-5563 


Monroeville 15146 


Arrow Electronics 


1 08-1 1 400 Bridgeport Road 
Richmond V6X 1T2 


Hamilton/Avnet Electronics 


TWX: 810-459-1611 


Tei: (412) 856-7000 


1946 Parkway Blvd. 


2795 Halpern 






Salt Lake City 841 19 
Tel: (801) 973-6913 


Tel: (604) 273-5575 
TWX; 04-5077-89 


St. Laurent H2E 7K1 


tArrov^ Electronics. Inc. 


Hamilton/Avnet Electronics 


Tel: (514) 335-1000 
TWX: 610-421-3731 


6238 Cochran Road 


2800 Liberty Ave. 
Pittsburgh 15238 
Tel: (412) 281-4150 






Solon 441 39 


tHamilton/Avnet Electronics 


MANITOBA 




Tel: (216) 248-3990 


1585 West 2100 South 




Zentronics 


TWX: 810-427-9409 




Salt Lake City 84119 
Tel: (801) 972-2800 


Zentronics 


817 McCaffrey 

St Laurent H4T1M3 




Pioneer Electronics 


60-1313 Border Unit 60 


tHamilton/Avnet Electronics 


259 Kappa Drive 


TWX: 910-925-4018 


Winnipeg R3H 0X4 
Tel: (204) 694-1957 


Tel: (514) 737-9700 
TWX: 05-827-535 


954 Senate Drive 


Pittsburoh 15238 
Tel: (412) 782-2300 




Dayton 45459 


Wyle Distribution Group 
1325 West 2200 South 




Tel: (513) 439-6733 


TWX: 710-795-3122 


ONTARIO 




TWX: 810-450-2531 




Suite E 








tPioneer/Technoiogies Group, inc. 


West Valley 841 19 
Tel: (801) 974-9953 


Arrow Electronics, Inc. 




Hamilton/Avnet Electronics 


Delaware Valley 


36 Antares Dr. 




4588 Emery Industrial Pkwy. 
Warrensville Heights 441 28 
Tel:(216)349-5ltl0 


261 Gibralter Road 




Nepean K2E 7W6 
Tel: (613) 226-6903 




Horsham 19044 


WASHINGTON 




Tel: (215) 674-4000 
TWX: 510-665-6778 






TWX: 810-427-9462 


tAlmac Electronics Corp. 


Arrow Electronics, Inc. 








1 4360 S.E.Eastgate Way 


1093luleyerside 




tHamilton/Avnet Electronics 


TEXAS 


Bellevue 98007 


MississaugaL6TlM4 
Tel: (416) 673-7769 




777 Brooksedge Blvd. 
Westerville 43081 




Tel: (206) 643-9992 




tArrow Electronics, Inc. 
3220 Commander Drive 


TWX: 910-444-2067 


TWX: 06-218213 




Tel: (614) 882-7004 










Carrollton 75006 


Arrow Electronics, Inc. 


tHamilton/Avnet Electronics 




tPioneer Electronics 


Tel: (214) 380-6464 


19540 68th Ave. South 


6845 Rexwood Road 




4433 Interpoint Boulevard 


TWX: 910-860-5377 


Kent 98032 


Units 3-4-5 




Dayton 45424 




Tel: (206) 575-4420 


ti/lississaugaL4T1R2 
Tel: (416) 677-7432 




Tel: (513) 236-9900 


tArrow Electronics, Inc. 






TWX: 810-459-1622 


10899 Kinghurst 


tHamilton/Avnet Electronics 
14212 N.E. 21st Street 


TWX: 610-492-8867 






Suite 100 






tPioneer Electronics 


Houston 77099 


Bellevue 98005 


Hamilton/Avnet Electronics 




4800 E. 131st Street 


Tel: (713) 530-4700 


Tel: (206) 643-3960 


6845 Rexwood Road 




Cleveland 44105 


TWX: 910-880-4439 


TWX: 910-443-2469 


Unit 6 




Tei: (216) 587-3600 






MississauqaL4T1R2 
Tel: (416) 277-0484 




TWX: 810-422-2211 


tArrow Electronics, Inc. 


Wyle Distribution Group 
15385 N.E. 90th Street 






2227 W. Braker Lane 






Austin 78758 


Redmond 98052 








Tel: (512) 835-4180 


Tel: (206) 881-1 150 








TWX: 910-874-1348 










tHamilton/Avnet Electronics 










1 807 W. Braker Lane 










Austin 78758 










Tel: (512) 837-8911 










TWX: 910-874-1319 
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EUROPEAN SALES OFFICES 



Intel Denmark A/S 
Glentevej 61 . 3rd Floor 
2400 Copenhagen NV 
Tel: (45) (01) 19 80 33 



Intel Finland OY 
Ruosjiantje 2 
00390 Helsinki 
Tel: (358) 544 644 



Intel Corporation S.A.R.L. 

1,RueEdison-BP303 

78054 St. Quentin-en-Yvellnes Cedex 

Tel: (33) (1)30 57 70 00 



Intel Corporation S.A.R.L. 
4, Quai des Etroits 
69321 Lyon Cedex 05 
Tel: (33) (16) 78 42 40 89 



WEST GERMANY 

Intel Semiconductor GmbH* 
Dornacher Strasse 1 
6016 Feldkirchen bei Muenchen 
Tel: (49) 089/90992-0 



Intel Semiconductor GmbH 
Hohenzollern Strasse 5 
3000 Hannover 1 
Tel: (49) 0511/344081 



Intel Semiconductor GmbH 
Abraham Lincoln Strasse 16-18 
620O Wiesbaden 
Tel: (49) 06121/7605-0 
TLX: 4-186183 

Intel Semiconductor GmbH 

Zettachring 10A 

7000 Stuttgart 80 

Tel: (49)i 0711/728728-0 



ISRAEL 

Intel Semiconductor Ltd.* 

Atidim Industrial Park-Neve Sharet 

P.O. Box 43202 

Tel-Aviv 61430 

Tel: (972) 03-498080 

TLX: 371215 

ITALY 

Intel Corporation Italia S.p.A.* 

Milanofiori Palazzo E 

20090 Assago 

Milano 

Tel: (39) (02) 824 40 71 

TLX: 341286 

NETHERLANDS 



Intel Norway A/S 

Hvamveien 4-PO Box 92 

2013Skietten 

Tel: (47) (6) 842 420 



Intel Iberia S.A. 
Zurbaran. 28 
28010 Madrid 
Tel: (34) 410 40 04 



SWITZERLAND 

Intel Semiconductor A.G. 

Zuerichstrasse 

6185 Winkel-Rueti bei Zuench 



UNITED KINGDOM 

Intel Corporation {U.K.) Ltd.* 
Pipers Way 

Sv»indon, Wiltshire SN3 1 RJ 
Tel: (44) (0793) 696000 
TLX: 444447/8 



EUROPEAN DISTRIBUTORS/REPRESENTATIVES 



BELGIUM 

ineico Belgium S.A. 

Av. des Croix de Guerre 

1120Bruxelles 

Oorlogskrujsenlaan, 94 

1120Brussel 

Tel: (32) (02) 216 01 60 



ITT-Multikomponent 
Naverland 29 
2600 Glostrup 
Tel: (45) (0)2 46 66 45 



OY Fintronic AB 
Melkonkatu 24A 
00210 Helsinki 
Tel: (358) (0) 6926022 



Genenm 

Z.A. de Courtaboeuf 

Av. de la 5altique-BP 68 

91943 LesUlis Cedex 

Tel: (33) (1 

TLX: 6917 

Jermyn 

73-79, rue des Solets 

SIlic 585 

94663 Rungis Cedex 

Tel: (33) (1)49 78 49 00 

TLX: 260967 

Metrologie 
Tour d'Asnieres 
4, av. Laurent-Cely 
92606 Asnieres Cedex 
Tel: (33) (1)47 90 62 40 
TLX: 61 1448 

Tekelec-Airtronic 
Cite des Bruyeres 
Rue Carle Vernet - BP 2 
92315 Sevres Cedex 
Tel: (33) (1)45 34 75 35 
TLX: 204552 



WEST GERMANY 



TLX: 522561 

ITT Multikomponent GmbH 
Postfach 1265 
Bahnhofstrasse 44 
7141 Moeglingen 
Tel: (49) 07141/4879 
TLX: 7264472 

Jermyn GmbH 
Im Dachsstueck 9 
6250 Limburg 
Tel: (49) 06431/506-0 
TLX: 415257-0 

Metrologie GmbH 
Meglingerstrasse 49 
8000 Muenchen 71 
Tel: (49) 089/78042-0 



Proelectron Vertriebs GmbH 
Max Planck Strasse 1-3 
6072 Dreieich 
Tel: (49) 06103/3040 



IRELAND 

Micro Marketing Ltd. 

Glenageary Office Park 

Glenageary 

Co. Dublin 

Tel: (21) (353) (01) 85 63 25 

TLX: 31584 

ISRAEL 

Eastronics Ltd. 
1 1 Rozanis Street 
P.O.B. 39300 
Tel-Aviv 61392 
Tel: (972) 03-475151 
TLX: 33638 



Divisione ITT Industries GmbH 

Viale Miianofioh 

Palazzo E/5 

20090 Assago 

Milano 

Tel: (39) 02/824701 

TLX: 31 1351 

Lasi Etettronica S.p.A. 
V. leFulvioTestl, 126 
20092 Cinisello Balsamo 
Milano 

Tel: (39) 02/2440012 
TLX: 352040 



NETHERLANDS 



NORWAY 



Nordisk Elektronikk (Norge) A/S 

Postboks123 

Smedsvingen 4 

1364Hvalstad 

Tel: (47) (02) 84 62 10 

TLX: 77546 



ATD Electronica, S.A. 
Plaza Ciudad de Viena, 6 
28040 Madrid 
Tel: (34) (1)234 40 00 



SWEDEN 

Nordisk Elektronik AB 

Huvudstagatan 1 

Box 1409 

17127Solna 

Tel: (46) 08-734 97 70 

TLX: 105 47 

SWITZERLAND 

Industrade A.G. 
Hertistrasse 31 
6304 Walllsellen 
Tel: (41) (801) 63 05 04 



EMPA Electronic 
Lindwurmstrasse 95A 
8000 Muenchen 2 
Tel: (4ffl 089/53 80 570 



UNITED KINGDOM 

Accent Electronic Components Ltd. 
Jubilee House, Jubilee Road 
Letchworth, Herts SG6 1TL 
Tel: (44) (0462) 686666 



Bytech-Comway Systems 
3 The Western Centre 
Western Road 
Bracknell RG1 2 1RW 
Tel: (44) (0344) 55333 



Otford Road 
Sevenoaks 
KentTN14 5EU 
Tel: (44) (0732) 450144 



MMD 

Unit 8 Southvievif Park 

Caversham 



TLX: 846f 

Rapid Silicon 
Rapid House 
Denmark Street 
High Wycombe 
Buckinghamshire HP1 1 2ER 
Tel: (44) (0494) 442266 
TLX: 837931 

Rapid Systems 
Rapid House 
Denmark Street 
High Wycombe 
Buckinghamshire HP1 1 2ER 
Tel: (44) (0494) 450244 
TLX: 837931 

YUGOSLAVIA 

Rapido Electronic Components S.p.8 

Via C. Beccaria, 8 

34133 Trieste 

Italia 

Tel: (39) 040/360555 

TU: 460461 



•Field Application Location 



inter 



INTERNATIONAL SALES OFFICES 



JAPAN (Cont'd.) 



200 Pacific Hwv., level 6 
Crows Nest, NSE, 2065 
Tel: 612-957-2744 
FAX: 612-923-2632 



01311 - Sao Paulo - S.P. 
Tel: 56-11-287-5899 
TLX: 3911153146 ISDB 
FAX: 65-11-287-5899 

CHINA/HONG KONG 



JIan Guo Men Wat Street 
Belling, PRC 
Tel: (1) 500-4850 
TLX: 22947 INTEL CN 
FAX: (1)500-2953 



Bond Center 
Queensway, Central 
Hong Kong 
Tel: (5) 8444-555 
TLX: 63869 ISHLHK HX 
FAX: (5) 8681-989 



Tel: 91-812-567201 
TLX: 9538452354 MACH 
FAX: 091-812-563982 

JAPAN 

Intel Japan K.K. 

5-6 Tokodal, Tsukuba-shI 

IbarakI, 300-26 

Tel: 029747-8511 

TLX: 3656-160 

FAX: 029747-8450 



Intel Japari K.K.* 
Flower-Hill Shin-machi BIdg. 
1-23-9 Shinmachi 
Setagaya-ku, Tokyo 154 



Intel Japan K.K.* 
BIdg. Kumagaya 
2-69 Hon-cho 
Kumagaya-shi, Saitama 360 



2-4-1 Terauchi 
Toyonaka-shi, Osaka 560 
Tel: 06-863-1091 
FAX: 06-863-1084 

Intel Japan K.K. 
Shinmaru BIdg. 
1-5-1 Marunouchi 
Chiyoda-ku, Tokyo 100 



Intel Japan K.K. 
Green BIdg. 
1-16-20 Nishiki 
Naka-ku, Nagoya-shi 
Aichi 450 
Tel: 052-204-1261 
FAX: 052-204-1285 



Intel Technology Asia, Ltd. 

Business Center 16th Floor 

61, Yoido-Dong, Young Deung Po-Ku 

Seoul 150 

Tel: (2) 784-6186, 8286, 8386 

TLX: K29312 INTELKO 

FAX: (2) 784-8096 



Intel Singapore Technology, L 
101 Thomson Road #21-06 
Gotdhill Square 
Singapore 1130 
Tel: 250-7811 
TLX: 39921 INTEL 
FAX: 250-9256 



Intel Technology Far East Ltd. 

Taiwan Branch 

10/F, No. 205, Tun Hua N. Road 

Taipei, R.O.C. 

Tel: 886-2-716-9660 

TLX: 13159 INTELTWN 

FAX: 886-2-717-2455 



INTERNATIONAL 
DISTRIBUTORS/ REPRESENTATIVES 



DAFSYS S.R.L. 
Chacabuco, 90-6 PISO 
1069-Buenos Aires 



Email Electronics 

15-17 Hume Street 

Huntingdale,3166 

Tel:011-61-3-S44-8244 

TLX: AA 30895 

FAX: 011-61-3-543-8179 

BRAZIL 

Etebra Microelectronica S.A. 

Rua Geraldo Flausina Gomes, 78 

1.0th Floor 

04575 - Sao Paulo - S.P. 

Tel: 55-11-534-9641 

TLX: 55-11-54693/54591 

FAX: 55-11-534-9424 

CHILE 

DIN Instruments 

Suecia 2323 

Casilla 6055, Correo 22 



No. 65 D.V.G. Road 



CHINA/HONG KONG 



Phase 1, 26 Kwai Hei Street 

N.T., Kowloon 

Hong Kong 

Tel: 852-0-223-222 

TWX:39114JINMIHX 

FAX: 852-0-261-602 



Bangalore 560 004 
Tel: 011-91-812-600-631 
011-91-812-621-455 
TLX: 9538458332 MDBG 

Micronic Devices 
Flat 403, Gagan Deep 
12, Raiendra Place 
New Delhi 110 008 
Tel: 011-91-58-97-71 
011-91-57-23509 
TLX: 9533163235 MDND 

Micronic Devices 

No. 516 5th Floor 

Swastik Chambers 

Sion, Trombay Road 

Chembur 

Bombay 400 071 

Tel: 01 1-91 523963/527896 

TLX: 9531 171447 MDEV 

S&S Corporation 
Camden Business Center 
Suite 6 

1610 Blossom Hill Rd. 
San Jose, CA 95124 
, U.S.A. 
Tel: (408) 978-621 6 
TLX: 820281 



Asahi Electronics Co. Ltd. 
KMM BIdg. 2-14-1 Asano 
Kokurakita-ku 
Kitakyushu-shi 802 



C. Itoh Techno-Science Co., Ltd. 
4-8-1 Dobashi, Miyamae-ku 
Kawasaki-shi, Kanagawa 213 



JAPAN (Cont'd.) 

Dia Semicon Systems, Inc. 
Wacore 64, 1-37-8 Sangenjaya 
Setagaya-ku, Tokyo 154 



2-4-18 Sakae 
Naka-ku, Nagoya-shi 460 
Tel: 052-204-2916 
FAX: 052-204-2901 

Ryoyo Electro Corp. 
Konwa BIdg. 
1-12-22 Tsukiji 
Chuo-ku, Tokyo 104 



24-3 Yoido-Dong 

Youngdeungpo-KU 

SeouflSO 

Tel: 82-2-782-8039 

TLX: 25299 KODIGIT 

FAX: 82-2-784-8391 

Samsung Semiconductor & 

Telecommunications Co., Ltd. 

150, 2-KA, Tafpyung-ro, Chung-ku 

Seoul 100 

Tel: 82-2-751-3987 

TLX: 27970 KORSST 

FAX: 82-2-753-0967 



Dicopel S.A. 

Av. Federalismo Sur 

268-2-PLSO 

C.P. 44-100-Guadalaiara 

Tel: 52-36-26-1232 

TLX: 681663 DICOME 

FAX: 52-36-26-3966 

Dicopel S.A. 

Tochtii 368 Fracc. Ind. San Antonio 

Azcapotzaico 

C.P. 02760-Mexico, D.F. 

Tel: 52-6-561-3211 

TLX: 1773790 DICOME 

FAX: 52-5-561-1279 



NEW ZEALAND 



36 Olive Road 
Penrose, Auckland 
Tel: 011-64-9-591155 
FAX: 64-9-592681 



Electronic Resources Re, Ltd. 

17 Harvey Road #04-01 

Singapore 1336 

Tel: 283-0888, 289-1618 

TWX: 56541 FRELS 

FAX: 2895327 

SOUTH AFRICA 

Electronic Building Elements 
178 Erasmus Street 
Meyerspark, Pretoria, 0184 



TAIWAN 

Micro Electronics Corporation 
No. 585, Miiig Shen East Rd. 
Taipei, R.O.CT 
Tel: 886-2-501-8231 
FAX: 886-2-501-4265 



R.b.C. 

Tel: (02) 5010055 

FAX: (02) 5012521 

(02)5058414 

VENEZUELA 

P. Benavides S.A. 

Avilanes a Rio 

Residencia Kamarata 

Locales 4 AL7 

La Candelaria, Caracas 

Tel: 58-2-574-6338 

TLX: 28450 

FAX: 58-2-572-3321 
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